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Preface 


The thought of writing this book began with the realization that not a single book 
existed with the title HDR? Duh! While we rejected the idea of this title shortly after, 
both the idea for this book and its title matured and now you have the final result 
in your hands. 

High dynamic range imaging is an emerging field, and for good reasons. You are 
either already convinced about that, or we hope to convince you with this book. At 
the same time, research in this area is an amazing amount of fun, and we hope that 
some of that shines through as well. 

Together, the four authors are active in pretty much all areas of high dynamic 
range imaging, including capture devices, display devices, file formats, dynamic 
range reduction, and image-based lighting. This book recounts our experience with 
these topics. It exists in the hope that you find it useful in some sense. 

The visual quality of high dynamic range images is vastly higher than conven- 
tional low-dynamic-range images. The difference is as big as the difference between 
black-and-white and color television. Once the technology matures, high dynamic 
range imaging will become the norm rather than the exception. It will not only 
affect people in specialized fields such as film and photography, computer graphics, 
and lighting design but will affect everybody who works with images. 

High dynamic range imaging is already gaining widespread acceptance in the 
film industry, photography, and computer graphics. Other fields will follow soon. 
In all likelihood, general acceptance will happen as soon as high dynamic range 
display devices are available for the mass market. The prognosis is that this may be 
as little as only a few years away. 

At the time of writing, there existed no single source of information that could 
be used both as the basis for a course on high dynamic range imaging and as a work 
of reference. With a burgeoning market for high dynamic range imaging, we offer 
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this book as a source of information for all aspects of high dynamic range imaging, 
including image capture, storage, manipulation, and display. 
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Introduction 


There are many applications that involve 
digital images. They are created with mod- 
ern digital cameras and scanners, rendered 
with advanced computer graphics tech- 
niques, or produced with drawing pro- 
grams. These days, most applications rely 
on graphical representations of some type. 

During their lifetime, digital images undergo a number of transformations. First, 
they are created using one of the previously cited techniques. Then they are stored 
on a digital medium, possibly edited via an image-processing technique, and ulti- 
mately displayed on a computer monitor or printed as hardcopy. 

Currently, there is a trend toward producing and using higher-resolution images. 
For example, at the time of writing there exist consumer-level digital cameras that 
routinely boast 5- to 6-megapixel sensors, with 8- to 11-megapixel sensors avail- 
able. Digital scanning backs routinely offer resolutions that are substantially higher. 
There is no reason to believe that the drive for higher-resolution images will abate 
anytime soon. For illustrative purposes, the effect of various image resolutions on 
the visual quality of an image is shown in Figure 1.1. 

Although the trend toward higher-resolution images is apparent, we are at the 
dawn of a major shift in thinking about digital images, which pertains to the range 
of values each pixel may represent. Currently, the vast majority of color images is 
represented with a byte per pixel for each of the red, green, and blue channels. 
With three bytes per pixel, more than 1.6 million different colors can be assigned 
to each pixel. This is known in many software packages as “millions of colors.” 

This may seem to be an impressively large number at first, but it should be 
noted that there are still only 256 values for each of the red, green, and blue com- 
ponents of each pixel. Having just 256 values per color channel is inadequate for 
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FIGURE 4.4 Increasing the number of pixels in an image reduces aliasing artifacts. The image 
on the left has a resolution of 128 by 96 pixels, whereas the image on the right has a resolution 
of 1,024 by 700 pixels. 


representing many scenes. An example is shown in Figure 1.2, which includes an 
automatically exposed 8-bit image on the left. Although the subject matter may be 
unusual, the general configuration of an indoor scene with a window is quite com- 
mon. This leads to both bright and dark areas in the same scene. As a result, in 
Figure 1.2 the lake shown in the background is overexposed. 

The same figure shows on the right an example that was created, stored, and 
prepared for printing with techniques discussed in this book. In other words, it is a 
high-dynamic-range (HDR) image before the final display step was applied. Here, 
the exposure of both the indoor and outdoor areas has improved. Although this 


FIGURE 1.2 Optimally exposed conventional images (left) versus images created with techniques 
described in this book (right). 
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A conventional image is shown on the left, and an HDR version is shown on the 
right. The right-hand image was prepared for display with techniques discussed in Section 7.2.7. 


image shows more detail in both the dark and bright areas, this is despite the fact 
that this image is shown on paper so that the range of values seen is not higher than 
in a conventional image. Thus, even in the absence of a display device capable of 
displaying them, there are advantages to using HDR images. The difference between 
the two images in Figure 1.2 would be significantly greater if the two were displayed 
on one of the display devices discussed in Chapter 5. 

A second example is shown in Figure 1.3. The image on the left is a conventional 
image shot under fairly dark lighting conditions, with only natural daylight being 
available. The same scene was photographed in HDR, and then prepared for display 


CHAPTER 01. INTRODUCTION 5 


The image on the left is represented with a bit depth of 4 bits. The image on the 
right is represented with 8 bits per color channel. 


with techniques discussed in this book. The result is significantly more flattering, 
while at the same time more details are visible. 

The range of values afforded by a conventional image is about two orders of 
magnitude, stored as a byte for each of the red, green, and blue channels per pixel. 
It is not possible to directly print images with a much higher dynamic range. Thus, 
to simulate the effect of reducing an HDR image to within a displayable range, we 
reduce a conventional photograph in dynamic range to well below two orders of 
magnitude. As an example, Figure 1.4 shows a low-dynamic-range (LDR) image (8 
bits per color channel per pixel), and the same image reduced to only 4 bits per 
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Condition Illumination 
(in cd/m?) 
Starlight 1073 
Moonlight 1071 
Indoor lighting 102 
Sunlight 10° 
Max. intensity of common CRT monitors 102 


TABLE 1.14 Ambient luminance levels for some common lighting environments (from 
Wandell’s book Foundations of Vision [135]). 


color channel per pixel. Thus, fewer bits means a lower visual quality. Although for 
some scenes 8 bits per color channel is enough, there are countless situations in 
which 8 bits is not enough. 

One of the reasons for this is that the real world produces a much greater range 
than the two orders of magnitude common in current digital imaging. For instance, 
the sun at noon may be 100 million times brighter than starlight [34,120]. Typ- 
ical ambient luminance levels for commonly encountered scenes are outlined in 
Table 1.1.1 

The human visual system is capable of adapting to lighting conditions that vary 
by nearly 10 orders of magnitude [34]. Within a scene, the human visual system 
functions over a range of about five orders of magnitude simultaneously. 

This is in stark contrast to typical CRT (cathode-ray tube) displays, which are 
capable of reproducing about two orders of magnitude of intensity variation. Their 
limitation lies in the fact that phosphors cannot be excited beyond a given limit. 
For this reason, 8-bit digital-to-analog (D/A) converters are traditionally sufficient 


1 Luminance, defined in the following chapter, is a measure of how bright a scene appears. 


CHAPTER 01. INTRODUCTION 7 


for generating analog display signals. Higher bit depths are usually not employed, 
because the display would not be able to reproduce such images at levels that are 
practical for human viewing? 

A similar story holds for typical modern liquid crystal displays (LCD). Their 
operating range is limited by the strength of the backlight. Although LCD displays 
tend to be somewhat brighter than CRT displays, their brightness is not orders of 
magnitude greater. 

In that current display devices are not capable of reproducing a range of lumi- 
nances anywhere near the capability of the human visual system, images are typ- 
ically encoded with a byte per color channel per pixel. This encoding normally 
happens when the image is captured. This situation is less than optimal because 
much of the information available in a scene is irretrievably lost at capture time. 

A preferable approach is to capture the scene with a range of intensities and level 
of quantization representative of the scene, rather than matched to any display de- 
vice. Alternatively, images should at a minimum contain a range of values matched 
to the limits of human vision. All relevant information may then be retained until 
the moment the image needs to be displayed on a display device that cannot repro- 
duce this range of intensities. This includes current CRT, LCD, and plasma devices, 
as well as all printed media. 

Images that store a depiction of the scene in a range of intensities commensurate 
with the scene are what we call HDR, or “radiance maps.” On the other hand, we 
call images suitable for display with current display technology LDR. 

This book is specifically about HDR images. These images are not inherently 
different from LDR images, but there are many implications regarding the creation, 
storage, use, and display of such images. There are also many opportunities for 
creative use of HDR images that would otherwise be beyond our reach. 

Just as there are clear advantages to using high-image resolutions, there are ma- 
jor advantages in employing HDR data. HDR images and video are matched to the 
scenes they depict, rather than the display devices they are meant to be displayed on. 
As a result, the fidelity of HDR images is much higher than with conventional im- 
agery. This benefits most image processing that may be applied during the lifetime 
of an image. 


2 It would be possible to reproduce a much larger set of values on CRT displays at levels too low for humans to perceive. 
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Color manipulation achieved on an HDR capture (left) produced the image on 
the right. The left-hand HDR image was captured under normal daylight (overcast sky). The 
right-hand image shows a color transformation achieved with the algorithm detailed in Sec- 
tion 7.2.7. 


As an example, correcting the white balance of an LDR image may be difficult 
due to the presence of overexposed pixels, a problem that exists to a lesser extent 
with properly captured HDR images. This important issue, which involves an ad- 
justment of the relative contribution of the red, green, and blue components, is 
discussed in Section 2.6. HDR imaging also allows creative color manipulation and 
better captures highly saturated colors, as shown in Figure 1.5. It is also less impor- 
tant to carefully light the scene with light coming from behind the photographer, 
as demonstrated in Figure 1.6. Other image postprocessing tasks that become eas- 
ier with the use of HDR data include color, contrast, and brightness adjustments. 
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Photographing an object against a bright light source such as the sky is easier with 


HDR imaging. The left-hand image shows a conventional photograph, whereas the right-hand 
image was created using HDR techniques. 


Such tasks may scale pixel values nonlinearly such that parts of the range of val- 
ues require a higher precision than can be accommodated by traditional 8-bit pixel 
encodings. An HDR image representation would reduce precision errors to below 
humanly detectable levels. 

In addition, if light in a scene can be accurately represented with an HDR image, 
such images may be effectively used in rendering applications. In particular, HDR 
images may be used as complex light sources that light conventionally modeled 3D 
geometry. The lighting effects thus obtained would be extremely difficult to model 
in any other way. This application is discussed in detail in Chapter 9. 

Further, there is a trend toward better display devices. The first prototypes of HDR 
display devices have been around for at least two years at the time of writing [114, 
115] (an example is shown in Figure 1.7). Their availability will create a much 
larger market for HDR imaging in general. In that LDR images will look no better 
on HDR display devices than they do on conventional display devices, there will be 
an increasing demand for technology that can capture, store, and manipulate HDR 
data directly. 
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Sunnybrook Technologies prototype HDR display device. 


It is entirely possible to prepare an HDR image for display on an LDR display 
device, as shown in Figure 1.2, but it is not possible to reconstruct a high-fidelity 
HDR image from quantized LDR data. It is therefore only common sense to create 
and store imagery in an HDR format, even if HDR display devices are ultimately not 
used to display it. Such considerations (should) play an important role, for instance, 
in the design of digital heritage and cultural archival systems. 

When properly displayed on an HDR display device, HDR images and video sim- 
ply look gorgeous! The difference between HDR display and conventional imaging 
is easily as big a step forward as the transition from black-and-white to color tele- 
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vision. For this reason alone, HDR imaging will become the norm rather than the 
exception, and it was certainly one of the reasons for writing this book. 

However, the technology required to create, store, manipulate, and display HDR 
images is only just emerging. There is already a substantial body of research available 
on HDR imaging, which we collect and catalog in this book. The following major 
areas are addressed in this book. 


Light and color: HDR imaging borrows ideas from several fields that study light and 
color. The following chapter reviews several concepts from radiometry, photom- 
etry, and color appearance and forms the background for the remainder of the 
book. 


HDR image capture: HDR images may be created in two fundamentally different 
ways. The first method employs rendering algorithms and other computer 
graphics techniques. Chapter 9 outlines an application in which HDR imagery 
is used in a rendering context. 

The second method employs conventional (LDR) photo cameras to capture 
HDR data. This may be achieved by photographing a static scene multiple times, 
varying the exposure time for each frame. This leads to a sequence of images that 
may be combined into a single HDR image. An example is shown in Figure 1.8, 
and this technique is explained in detail in Chapter 4. 

This approach generally requires the subject matter to remain still between 
shots, and toward this end the camera should be placed on a tripod. This lim- 
its, however, the range of photographs that may be taken. Fortunately, several 
techniques exist that align images, remove ghosts, and reduce the effect of lens 
flare, thus expanding the range of HDR photographs that may be created. These 
techniques are discussed in Chapter 4. 

In addition, photo, film, and video cameras will in due course become avail- 
able that will be capable of directly capturing HDR data. As an example, the 
FilmStream Viper is a digital camera that captures HDR data directly. Although 
an impressive system, its main drawback is that it produces raw image data at 
such a phenomenal rate that hard drive storage tends to fill up rather quickly. 
This is perhaps less of a problem in the studio, where bulky storage facilities 
may be available, but the use of such a camera on location is restricted by the 
limited capacity of portable hard drives. It directly highlights the need for effi- 
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Multiple exposures (shown on the right) may be combined into one HDR image 


(left). 


cient file formats for storing HDR video. Storage issues are discussed further in 
Chapter 3. 


HDR security cameras, such as the SMaL camera, are also now available (Fig- 
ure 1.9). The main argument for using HDR capturing techniques for security 
applications is that typical locations are entrances to buildings. Conventional 
video cameras are typically not capable of faithfully capturing the interior of a 
building at the same time the exterior is monitored through the window. An 
HDR camera would be able to simultaneously record indoor and outdoor activ- 
ities. 
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IRE 4.9 SMal prototype security camera. Image used by permission from Cypress Semi- 
condutor. 


Many consumer-level photo cameras are equipped with 10- or 12-bit A/D 
converters and make this extra resolution available through proprietary RAW? 
formats (see Chapter 3). However, 10 to 12 bits of linear data affords about the 
same precision as an 8-bit gamma-compressed format, and may therefore still 
be considered LDR. 


HDR image representation: Once HDR data is acquired, it needs to be stored in some 
fashion. There are currently a few different HDR file formats emerging. The de- 
sign considerations for HDR file formats include the size of the resulting files, 
the total range that may be represented (i.e., the ratio between the largest repre- 


3 RAW image formats are manufacturer's and often model-specific file formats containing improcessed sensor output. 
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sentable number and the smallest), and the smallest step size between successive 
values. These trade-offs are discussed in Chapter 3, which also introduces stan- 
dards for HDR image storage. 


HDR display devices: Just as display devices have driven the use of 8-bit image 
processing for the last 20 years, the advent of HDR display devices will impact 
the general acceptance of HDR imaging technology. 

Current proposals for HDR display devices typically employ a form of back- 
projection, either in a system that resembles a slide viewer or in technology that 
replaces the single backlight of an LCD display with a low-resolution but HDR 
projective system. The latter technology thus provides HDR display capabilities 
by means of a projector or LED array that lights the LCD display from behind with 
a spatially varying light pattern [114,115]. The display augments this projected 
image with a high-resolution but LDR LCD. These emerging display technologies 
are presented in Chapter 5. 


Image-based lighting: In Chapter 9 we explore in detail one particular application 
of HDR imaging; namely, image-based lighting. Computer graphics is gener- 
ally concerned with the creation of images by means of simulating how light 
bounces through a scene [42,116,144]. In many cases, geometric primitives 
such as points, triangles, polygons, and splines are used to model a scene. These 
are then annotated with material specifications, which describe how light in- 
teracts with these surfaces. In addition, light sources need to be specified to 
determine how the scene is lit. All of this information is then fed to a rendering 
algorithm that simulates light and produces an image. Well-known examples are 
represented in films such as the Shrek and Toy Story series. 

A recent development in rendering realistic scenes takes images as primitives. 
Traditionally, images are used as textures to describe how a surface varies over 
space. As surface reflectance ranges between 1 and 99% of all incoming light, 
the ability of a diffuse surface to reflect light is inherently LDR. It is therefore 
perfectly acceptable to use LDR images to describe things such as wood grain 
on tables or the pattern of reflectance of a gravel path. On the other hand, sur- 
faces that reflect light specularly may cause highlights that have nearly the same 
luminance as the light sources they reflect. In such cases, materials need to be 
represented with a much higher precision. 


CHAPTER 01. INTRODUCTION 15 


FIGURE © HDR images may be used to light an artificial scene. 


In addition, images may be used as complex sources of light within otherwise 
conventional rendering algorithms [17], as shown in Figure 1.10. Here, we can- 
not get away with using LDR data because the range of light emitted by various 
parts of a scene is much greater than the two orders of magnitude available with 
conventional imaging. If we were to light an artificial scene with a representation 
of a real scene, we would have to resort to capturing this real scene in HDR. This 
example of HDR image usage is described in detail in Chapter 9. 


Dynamic range reduction: Although HDR display technology will become generally 
available in the near future, it will take time before most users have made the 
transition. At the same time, printed media will never become HDR because this 
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would entail the invention of light-emitting paper. As a result, there will always 
be a need to prepare HDR imagery for display on LDR devices. 

It is generally recognized that linear scaling followed by quantization to 8 bits 
per channel per pixel will produce a displayable image that looks nothing like 
the original scene. It is therefore important to somehow preserve key qualities 
of HDR images when preparing them for display. The process of reducing the 
range of values in an HDR image such that the result becomes displayable in 
some meaningful way is called dynamic range reduction. Specific algorithms 
that achieve dynamic range reduction are referred to as tone-mapping, or tone- 


FIGURE 1 


11 The monitor displays a tone-mapped HDR image depicting the background. Al- 
though the monitor is significantly less bright than the scene itself, a good tone reproduction operator 
would cause the scene and the displayed image to appear the same to a human observer. 
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reproduction, operators. The display of a tone-mapped image should perceptu- 
ally match the depicted scene (Figure 1.11). 

In that dynamic range reduction requires preservation of certain scene charac- 
teristics, it is important to study how humans perceive scenes and images. Many 
tone-reproduction algorithms rely wholly or in part on some insights of human 
vision, not least of which is the fact that the human visual system solves a similar 
dynamic range reduction problem in a seemingly effortless manner. We survey 
current knowledge of the human visual system as it applies to HDR imaging, 
and in particular to dynamic range reduction, in Chapter 6. 


Tone reproduction: Although there are many algorithms capable of mapping HDR 
images to an LDR display device, there are only a handful of fundamentally dif- 
ferent classes of algorithms. Chapters 7 and 8 present an overview of all currently 
known algorithms, classify them into one of four classes, and discuss their ad- 
vantages and disadvantages. Many sequences of images that show how parameter 
settings affect image appearance for each operator are included in these chapters. 


Although the concept of HDR imaging is straightforward (i.e., representing 
scenes with values commensurate with real-world light levels), the implications to 
all aspects of imaging are profound. In this book, opportunities and challenges with 
respect to HDR image acquisition, storage, processing, and display are cataloged in 
the hope that this contributes to the general acceptance of this exciting emerging 
technology. 


Light and Color 


The emerging field of HDR imaging is di- 
rectly linked to diverse existing disciplines 
such as radiometry, photometry, colorime- 
try, and color appearance — each dealing 
with specific aspects of light and its per- 
ception by humans. In this chapter we dis- 
cuss all aspects of color that are relevant to 
HDR imaging. This chapter is intended to 
provide background information that will 
form the basis of later chapters. 


2.1 RADIOMETRY 


The term scene indicates either an artificial or real environment that may become the 
topic of an image. Such environments contain objects that reflect light. The ability 
of materials to reflect light is called “reflectance.” 

Radiometry is the science concerned with measuring light. This section first 
briefly summarizes some of the quantities that may be measured, as well as their 
units. Then, properties of light and how they relate to digital imaging are discussed. 

Light is radiant energy, measured in joules. Because light propagates through 
media such as space, air, and water, we are interested in derived quantities that 
measure how light propagates. These include radiant energy measured over time, 
space, or angle. The definitions of these quantities and their units are outlined in 
Table 2.1 and should be interpreted as follows. 
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Quantity Unit Definition 
Radiant energy (Qe) J (joule) Qe 
d 
Radiant power (Pe) Js-l=W (watt) P= 2 
ee 3 dP: 
Radiant exitance (Me) Wm M. = — 
dAe 
Irradiance (Ee) wm~? Ee = a 
e ¿e= dA. 
dP, 
Radiant intensity (Je) W sr™1 kae 
dw 
&? P, 
Radiance (Le) W m~? srt L z 


ER dA cos dw 


TABLE 2.4 Radiometric quantities. The cosine term in the definition of Le is the 
angle between the surface normal and the angle of incidence, as shown in Figure 2.4. 
Other quantities are shown in Figures 2.1 through 2.3. 


Because light travels through space, the flow of radiant energy may be measured. 
It is indicated with radiant power or radiant flux and is measured in joules per 
second, or watts. It is thus a measure of energy per unit of time. 

Radiant flux density is the radiant flux per unit area, known as irradiance if we 
are interested in flux arriving from all possible directions at a point on a surface 
(Figure 2.1) and as radiant exitance for flux leaving a point on a surface in all possible 
directions (Figure 2.2). Both irradiance and radiant exitance are measured in watts 
per square meter. These are therefore measures of energy per unit of time as well as 
per unit of area. 

If we consider an infinitesimally small point light source, the light emitted into 
a particular direction is called radiant intensity measured in watts per steradian 
(Figure 2.3). A steradian is a measure of solid angle corresponding to area on the 
unit sphere. Radiant intensity thus measures energy per unit of time per unit of 
direction. 
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Irradiance: power incident upon unit area d A. 


Radiant exitance: power emitted per unit area. 
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"Point" light source 


Radiant intensity: power per solid angle dw. 


Radiance: power incident on a unit surface area dA from a unit set of directions 


do. 
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Flux passing through, leaving, or arriving at a point in a particular direction is 
known as radiance measured in watts per square meter per steradian (Figure 2.4). 
It is a measure of energy per unit of time as well as per unit of area and per unit 
of direction. Light that hits a point on a surface from a particular direction is at the 
heart of image formation. For instance, the combination of shutter, lens, and sensor 
in a (digital) camera restricts incoming light in this fashion. 

When a picture is taken, the shutter is open for a small amount of time. Dur- 
ing that time, light is focused through a lens that limits the number of directions 
from which light is received. The image sensor is partitioned into small pixels, so 
that each pixel records light over a small area. The light recorded by a pixel may 
be modeled by the “measurement equation” (see, for example, [66] for details). 
Because a camera records radiance, it is therefore possible to relate the voltages ex- 
tracted from the camera sensor to radiance, provided pixels are neither under- nor 
overexposed [104,105]. 

Each of the quantities given in Table 2.1 may also be defined per unit wavelength 
interval, which are then referred to as spectral radiance Lea, spectral flux P..,, and 
so on. The subscript e indicates radiometric quantities and differentiates them from 
photometric quantities (discussed in the following section). In the remainder of 
this book, these subscripts are dropped unless this leads to confusion. 

Light may be considered to consist of photons that can be emitted, reflected, 
transmitted, and absorbed. Photons normally travel in straight lines until they hit 
a surface. The interaction between photons and surfaces is twofold. Photons may 
be absorbed by the surface, where they are converted into thermal energy, or they 
may be reflected in some direction. The distribution of reflected directions, given 
an angle of incidence, gives rise to a surface’s appearance. Matte surfaces distribute 
light almost evenly in all directions (Figure 2.5), whereas glossy and shiny surfaces 
reflect light in a preferred direction. Mirrors are the opposite of matte surfaces and 
emit light specularly in almost a single direction. This causes highlights that may 
be nearly as strong as light sources (Figure 2.6). The depiction of specular surfaces 
may therefore require HDR techniques for accuracy. 

For the purpose of lighting simulations, the exact distribution of light reflected 
from surfaces as a function of angle of incidence is important (compare Figures 2.5 
and 2.6). It may be modeled with bidirectional reflection distribution functions 
(BRDFs), which then become part of the surface material description. Advanced 
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FIGURE 2.5 This object only reflects light diffusely. Because of the bright lighting conditions 
under which this photograph was taken, this image should look bright overall and without a large 
variation in tone. 


rendering algorithms use this information to compute how light is distributed in a 
scene, from which an HDR of the scene may be generated [24,58]. 


2.2 PHOTOMETRY 


Surfaces reflect light and by doing so may alter the spectral composition of it. Thus, 
reflected light conveys spectral information of both the light source illuminating a 
surface point and the reflectance of the surface at that point. 

There are many wavelengths that are not detectable by the human eye, which 
is sensitive to wavelengths between approximately 380 to 830 nanometers (nm). 
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The metal surface of this clock causes highlights that are nearly as strong as the 
light sources they reflect. The environment in which this image was taken is much darker than the 
one depicted in Figure 2.5. Even so, the highlights are much brighter. 


Within this range, the human eye is not equally sensitive to all wavelengths. In addi- 
tion, there are differences in sensitivity to the spectral composition of light among 
individuals. However, this range of sensitivity is small enough that the spectral sen- 
sitivity of any human observer with normal vision may be approximated with a 
single curve. Such a curve is standardized by the Commission Internationale de 
l’Eclairage (CIE) and is known as the V (å) curve (pronounced vee-lambda), or CIE 
photopic luminous efficiency curve. This curve is plotted in Figure 2.7. 
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CIE standard observer photopic luminous efficiency curve. 


In that we are typically interested in how humans perceive light, its spectral 
composition may be weighted according to V (À). The science of measuring light 
in units that are weighted in this fashion is called photometry. All radiometric terms 
introduced in the previous section have photometric counterparts, which are out- 
lined in Table 2.2. By spectrally weighting radiometric quantities with V (A), they 
are converted into photometric quantities. 

Luminous flux (or luminous power) is photometrically weighted radiant flux. It 
is measured in lumens, which is defined as 1/683 watt of radiant power at a fre- 
quency of 540 x 10!* Hz. This frequency corresponds to the wavelength for which 
humans are maximally sensitive (about 555 nm). If luminous flux is measured over 
a differential solid angle, the quantity obtained is luminous intensity, measured in 
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Quantity Unit 
Luminous power (Py) Im (lumen) 
Luminous energy (Qy) Im s 
Luminous exitance (My) Im m-2 
IIluminance (Ey) Im m~2 


Luminous intensity (Jy) 


Im srt = cd (candela) 


Luminance (Ly) cd m~? = nit 


TABLE 2.2 Photometric quantities. 


lumens per steradian. One lumen per steradian is equivalent to one candela. Lumi- 
nous exitance and illuminance are both given in lumens per square meter, whereas 
luminance is specified in candela per square meter (a.k.a. “nits”). 

Luminance is a perceived quantity. It is a photometrically weighted radiance and 
constitutes an approximate measure of how bright a surface appears. Luminance 
is the most relevant photometric unit to HDR imaging. Spectrally weighting radi- 
ance amounts to multiplying each spectral component with the corresponding value 
given by the weight function and then integrating all results, as follows. 


830 
f Le, Vda 
380 


L= 


The consequence of this equation is that there are many different spectral compo- 
sitions of radiance Le possible that would cause the same luminance value Ly. It 
is therefore not possible to apply this formula and expect the resulting luminance 
value to be a unique representation of the associated radiance value. 

The importance of luminance in HDR imaging lies in the fact that it provides a 
natural boundary of visible wavelengths. Any wavelength outside the visible range 
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does not need to be recorded, stored, or manipulated, in that human vision is 
not capable of detecting those wavelengths. Many tone-reproduction operators first 
extract a luminance value from the red, green, and blue components of each pixel 
prior to reducing the dynamic range, in that large variations in luminance over 
orders of magnitude have a greater bearing on perception than extremes of color 
(see also Section 7.1.2). 


2.3 COLORIMETRY 


The field of colorimetry is concerned with assigning numbers to physically defined 
stimuli such that stimuli with the same specification look alike (i.e., match). One 
of the main results from color-matching experiments is that over a wide range of 
conditions almost all colors may be visually matched by adding light from three 
suitably pure stimuli. These three fixed stimuli are called primary stimuli. Color- 
matching experiments take three light sources and project them to one side of a 
white screen. A fourth light source, the target color, is projected to the other side 
of the screen. Participants in the experiments are given control over the intensity of 
each of the three primary light sources and are asked to match the target color. 

For each spectral target, the intensity of the three primaries may be adjusted 
to create a match. By recording the intensities of the three primaries for each tar- 
get wavelength, three functions r(A), g(A), and bO) may be created. These are 
called color-matching functions. The color-matching functions obtained by Stiles 
and Burch are plotted in Figure 2.8. They used primary light sources that were 
nearly monochromatic with peaks centered on Àg = 645.2 nm, Ag = 525.3 nm, 
and Àg = 444.4 nm [122]. The stimuli presented to the observers in these ex- 
periments span 10 degrees of visual angle, and hence these functions are called 
10-degree color-matching functions. Because the recorded responses vary only a 
small amount between observers, these color-matching functions are representative 
of normal human vision. As a result, they were adopted by the CIE to describe the 
“CIE 1964 standard observer.” Thus, a linear combination of three spectral functions 
will yield a fourth, Q}, which may be visually matched to a linear combination of 
primary stimuli as follows. 


QL =F(A)R + 8(A)G +b(A)B 
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FIGURE 2.6 Stiles and Burch (1959) 10-degree color-matching functions. 


Here, R, G, and B are scalar multipliers. Because the primaries are fixed, the stim- 
ulus Q} may be represented as a triplet by listing R, G, and B. This (R, G, B) 
triplet is then called the tristimulus value of Q. 

For any three real primaries, it is sometimes necessary to supply a negative 
amount to reach some colors (i.e., there may be one or more negative compo- 
nents of a tristimulus value). In that it is simpler to deal with a color space whose 
tristimulus values are always positive, the CIE has defined alternative color-matching 
functions chosen such that any color may be matched with positive primary coef- 
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FIGURE 2.9 CIE 1931 2-degree XYZ color-matching functions. 


ficients. These color-matching functions are named X(A), Y(À), and Z(A) (plotted 
in Figure 2.9). These functions are the result of experiments in which the stimulus 
spanned 2 degrees of visual angle and are therefore known as the “CIE 1931 stan- 
dard observer” [149]. A spectral stimulus may now be matched in terms of these 


1 Real or realizable primaries are those that can be obtained by physical devices. For such primaries it is not possible to 
supply negative amounts because light cannot be subtracted from a scene. However, although less desirable in practice 
there is no mathematical reason a tristimulus value could not be converted such that it would be represented by a 
different set of primaries. Some of the values might then become negative. Such conversion issues are discussed 
further in Section 2.4. 
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color-matching functions, as follows. 
QO, =xX~A)X+ YAY +Z0)Z 


For a given stimulus Q), the tristimulus values (X, Y, Z) are obtained by integra- 
tion, as follows. 


830 
X= O,x(A)dd 
380 
830 
Y= Qa yA) dr 
380 


830 
LS Q)Z(A) dr 
380 
The CIE XYZ matching functions are defined such that a theoretical equal-energy 
stimulus, which would have unit radiant power at all wavelengths, maps to tristim- 
ulus value (1, 1, 1). Further, note that y(A) is equal to V(A) — another intentional 
choice by the CIE. Thus, Y represents photometrically weighted quantities. 

For any visible color, the tristimulus values in XYZ space are all positive. How- 
ever, as a result the CIE primaries are not realizable by any physical device. Such pri- 
maries are called “imaginary,” as opposed to realizable, primaries which are called 
“real.’? Associated with tristimulus values are chromaticity coordinates, which may 
be computed from tristimulus values as follows. 


x 
SS 
Vere 
Y 
y= — 
X4Y+Z 
py 


Because z is known if x and y are known, only the latter two chromaticity coor- 
dinates need to be kept. Chromaticity coordinates are relative, which means that 


2 This has nothing to do with the mathematical formulation of “real” and “imaginary” numbers. 
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JRE 2.10 CIE xy chromaticity diagram showing the range of colors humans can dis- 
fae (left). On the right, the triangular gamut spanned by the primaries defined by ITU 
Recommendation (ITU-R) BT.709 color space [57] is shown. 


within a given system of primary stimuli two colors with the same relative spectral 
power distribution will map to the same chromaticity coordinates. An equal-energy 
stimulus will map to coordinates (x = 1/3, y = 1/3). 

Chromaticity coordinates may be plotted in a chromaticity diagram with two 
axes. A CIE xy chromaticity diagram is shown in Figure 2.10. All monochromatic 
wavelengths map to a position along the curved boundary, called the spectral locus, 
which is of horseshoe shape. The line between red and blue is called the “pur- 
ple line,” which represents the locus of additive mixtures of short- and long-wave 
stimuli. 

The three primaries used for any given color space will map to three points in 
a chromaticity diagram and thus span a triangle. This triangle contains the range 
of colors that may be represented by these primaries (assuming nonnegative tri- 
stimulus values). The range of realizable colors given a set of primaries is called 
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the color gamut. Colors that are not representable in a given color space are called 
out-of-gamut colors. 

The gamut for the primaries defined by ITU-R (International Telecommunica- 
tion Union Recommendations) BT.709 is shown on the right in Figure 2.10. These 
primaries are a reasonable approximation of most CRT computer monitors and of- 
ficially define the boundaries of the sRGB color space [124] (see Section 2.11). The 
triangular region shown in this figure marks the range of colors that may be dis- 
played on a standard monitor. The colors outside this triangle cannot be represented 
on most displays. They also cannot be stored in an sRGB file, such as the one used 
for this figure. We are therefore forced to show incorrect colors outside the sRGB 
gamut in all chromaticity diagrams in this book. 

The diagrams in Figure 2.10 show two dimensions of what is a 3D space. The 
third dimension (luminance) goes out of the page, and the color gamut is really a 
volume of which a slice is depicted. In the case of the sRGB color space, the gamut 
is shaped as a six-sided polyhedron, often referred to as the “RGB color cube.” This 
is misleading, however, in that the sides are only equal in the encoding (0-255 
thrice) and are not very equal perceptually. 

It may be possible for two stimuli with different spectral radiant power dis- 
tributions to match against the same linear combination of primaries, and thus 
are represented by the same set of tristimulus values. This phenomenon is called 
metamerism. Whereas metameric stimuli will map to the same location in a chro- 
maticity diagram, stimuli that appear different will map to different locations. The 
magnitude of the perceived difference between two stimuli may be expressed as 
the Cartesian distance between the two points in a chromaticity diagram. However, 
in the 1931 CIE primary system the chromaticity diagram is not uniform (i.e., the 
distance between two points located in one part of the diagram corresponds to a 
different perceived color difference than two points located elsewhere in the dia- 
gram). Although CIE XYZ is still the basis for all color theory, this nonuniformity 
has given rise to alternative color spaces (discussed in the following sections). 


2.4 COLOR SPACES 


Color spaces encompass two different concepts. First, they are represented by a set 
of formulas that define a relationship between a color vector (or triplet) and the 
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standard CIE XYZ color space. This is most often given in the form of a 3-by-3 
color transformation matrix, although there are additional formulas if the space is 
nonlinear. Second, a color space is a 2D boundary on the volume defined by this 
vector, usually determined by the minimum and maximum value of each primary 
— the color gamut. Optionally, the color space may have an associated quantization 
if it has an explicit binary representation. In this section, linear transformations are 
discussed, whereas subsequent sections introduce nonlinear encodings and quanti- 
zation. 

We can convert from one tristimulus color space to any other tristimulus space 
using a 3-by-3 matrix transformation. Usually the primaries are known by their 
xy chromaticity coordinates. In addition, the white point needs to be specified, 
which is given as an xy chromaticity pair (xw, yw) plus maximum luminance Yw. 
The white point is the color associated with equal contributions of each primary 
(discussed further in the following section). 

Given the chromaticity coordinates of the primaries, first the z chromaticity 
coordinate for each primary is computed to yield chromaticity triplets for each 
primary; namely, (xp, yr, ZR), (XG, Ye, ZG), and (xg, yp, Zp). From the white point’s 
chromaticities and its maximum luminance, the tristimulus values (Xw, Yw, Zw) 
are calculated. Then, the following set of linear equations is solved for Sp, Sc, 
and Sp. 


Xw = xR SR + xa SG + xB Sp 
Yw = yrSp+ ya Sc + ye Sp 
Zw = ZRSR + ZG SG + Zp SB 


The conversion matrix to convert from RGB to XYZ is then given by 


X xRSR xXasc xB] TR 
Y |=| yrSr yese yaSz |} G 
Z ZR SR ZG Sc ZB Sp B 


The conversion from XYZ to RGB may be computed by inverting this matrix. If the 
primaries are unknown, or if the white point is unknown, a second best solution is 
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R G B White 
x 0.6400 0.3000 0.1500 0.3127 
y 0.3300 0.6000 0.0600 0.3290 


TABLE 2.3 Primaries and white point specified by ITU-Recommendation BT.709. 


to use a standard matrix such as that specified by ITU-R BT.709 [57]: 


X 0.4124 0.3576 0.18057 f R 
f] = [02128 0.7152 00722] G 


Z 0.0193 0.1192 0.95051 LB 


R 3.2405 —1.5371 —0.49857 X 
fe] = | -09653 1.8760 oats | 
B 0.0556 —0.2040 1.05721 LZ 


The primaries and white point used to create this conversion matrix are outlined in 
Table 2.3. 

There are several standard color spaces, each used in a particular field of science 
and engineering. Each is reached by constructing a conversion matrix, the previous 
matrix being an example. Several of these color spaces include a nonlinear transform 
akin to gamma correction, which is explained in Section 2.9. We therefore defer a 
discussion of other standard color spaces until Section 2.11. 

In addition to standard color spaces, most cameras, scanners, Monitors, and TVs 
use their own primaries (called spectral responsivities in the case of capturing de- 
vices). Thus, each device may use a different color space. Conversion between these 
color spaces is thus essential for the faithful reproduction of an image on any given 
display device. 

If a color is specified in a device-dependent RGB color space, its luminance may 
be computed because the Y component in the XYZ color space represents luminance 
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(recall that V (à) equals y(A)). Thus, a representation of luminance is obtained by 
computing a linear combination of the red, green, and blue components according 
to the middle row of the RGB-to-XYZ conversion matrix. For instance, luminance 
may be computed from ITU-R BT.709 RGB as follows. 


Y = 0.2126R + 0.7152G + 0.0722B 


Finally, an important consequence of color metamerism is that if the spectral re- 
sponsivities (primaries) associated with a camera are known, as well as the emissive 
spectra of the three phosphors of a CRT display, we may be able to specify a transfor- 
mation between the tristimulus values captured with the camera and the tristimulus 
values of the display and thus reproduce the captured image on the display. This 
would, of course, only be possible if the camera and display technologies did not 
impose restrictions on the dynamic range of captured and displayed data. 


2.5 WHITE POINT AND ILLUMINANTS 


For the conversion of tristimulus values between XYZ and a specific RGB color space, 
the primaries of the RGB color space must be specified. In addition, the white point 
needs to be known. For a display device, the white point is the color emitted if all 
three color channels are contributing equally. 

Similarly, within a given scene the dominant light source will produce a color 
cast that will affect the appearance of the objects in the scene. The color of a light 
source (illuminant) may be determined by measuring a diffusely reflecting white 
patch. The color of the illuminant therefore determines the color of a scene the 
human visual system normally associates with white. 

An often-used reference light source is CIE illuminant D¢s. This light source 
may be chosen if no further information is available regarding the white point of 
a device, or regarding the illuminant of a scene. Its spectral power distribution is 
shown in Figure 2.11, along with two related standard illuminants, D55 (commonly 
used in photography) and D7s. 

Cameras often operate under the assumption that the scene is lit by a specific 
light source, such as a Des. If the lighting in a scene has a substantially different 
color, an adjustment to the gain of the red, green, and blue sensors in the camera 
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Spectral power distribution of CIE illuminants D55, D65, and D75. 


may be made. This is known as white balancing [100]. If the white balance chosen 
for a particular scene were incorrect, white balancing might be attempted as an 
image-processing step. 

The difference between illuminants may be expressed in terms of chromaticity 
coordinates, but a more commonly used measure is correlated color temperature. 
Consider a blackbody radiator, a cavity in a block of material heated to a certain 
temperature. The spectral power distribution emitted by the walls of this cavity 
is a function of the temperature of the material only. The color of a blackbody 
radiator may thus be characterized by its temperature, which is measured in degrees 
Kelvin (K). 

The term color temperature refers to the temperature of a selective radiator that has 
chromaticity coordinates very close to that of a blackbody. The lower the tempera- 
ture the redder the appearance of the radiator. For instance, tungsten illumination 
(about 3,200° K) appears somewhat yellow. Higher color temperatures have a more 
bluish appearance. 
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Scene T (in °K) x y 
Candle flame 1850 0.543 0.410 
Sunrise/sunset 2000 0.527 0.413 
Tungsten (TV/film) 3200 0.427 0.398 
Summer sunlight at noon 5400 0.326 0.343 
CIE A (incandescent) 2854 0.448 0.408 
CIE B (direct sunlight) 4874 0.384 0.352 
CIE C (overcast sky) 6774 0.310 0.316 
CIE D50 (noon skylight) 5000 0.346 0.359 
CIE D65 (average daylight) | 6504 0.313 0.329 
CIE E (equal energy) 5500 0.333 0.333 
CIE F2 (office fluorescent) 4150 0.372 0.375 
TABLE 2.4 Correlated color temperature T and chromaticity coordinates (xy) for 
common scene types and a selection of CIE luminaires. 


The term correlated color temperature is more generally used for illuminants that do 
not have chromaticity coordinates close to those generated by blackbody radiators. 
It refers to the blackbody’s temperature that most closely resembles the perceived 
color of the given selective radiator under the same brightness and specified viewing 
conditions. Table 2.4 outlines the correlated color temperature of several common 
scene types and CIE luminaires, as well as their associated chromaticity coordinates. 

The CIE standard illuminant Des, shown in Figure 2.11, is defined as natural 
daylight with a correlated color temperature of 6,504 K. The D55 and D75 illumi- 
nants have correlated color temperatures of 5,503 and 7,504 K, respectively. Many 
color spaces are defined with a Des white point. In photography, D55 is often used. 
Display devices often use a white point of 9,300 K, which tends toward blue. The 
reason for this is that blue phosphors are relatively efficient and allow the overall 
display brightness to be somewhat higher, at the cost of color accuracy [100]. 
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Humans are very capable of adapting to the color of the light source in a scene. 
The impression of color given by a surface depends on its reflectance as well as 
the light source illuminating it. If the light source is gradually changed in color, 
humans will adapt and still perceive the color of the surface the same, although 
light measurements of the surface would indicate a different spectral composition 
and CIE XYZ tristimulus value [125]. This phenomenon is called chromatic adapta- 
tion. The ability to perceive the color of a surface independently of the light source 
illuminating it is called color constancy. 

Typically, when viewing a real scene an observer would be chromatically adapted 
to that scene. If an image of the same scene were displayed on a display device, the 
observer would be adapted to the display device and the scene in which the observer 
viewed the image. It is reasonable to assume that these two states of adaptation will 
generally be different. As such, the image shown is likely to be perceived differ- 
ently than the real scene. Accounting for such differences should be an important 
aspect of HDR imaging, and in particular tone reproduction. Unfortunately, too 
many tone-reproduction operators ignore these issues, although the photoreceptor- 
based operator, iCAM, and the Multiscale Observer Model include a model of chro- 
matic adaptation (see Sections 7.2.7, 7.3.3, and 7.3.4), and Akyuz et al. have shown 
that tone reproduction and color appearance modeling may be separated into two 
steps [4]. 

In 1902, von Kries speculated that chromatic adaptation is mediated by the three 
cone types in the retina [90]. Chromatic adaptation occurs as the red, green, and 
blue cones each independently adapts to the illuminant. 

A model of chromatic adaptation may thus be implemented by transforming tri- 
stimulus values into a cone response domain and then individually scaling the red, 
green, and blue components according to the current and desired illuminants. There 
exist different definitions of cone response domains leading to different transforms. 
The first cone response domain is given by the LMS color space, with L, M, and 
S standing respectively for long, medium, and short wavelengths. The matrix that 
converts between XYZ and LMS lies at the heart of the von Kries transform and is 
denoted MyonKries, as in the following. 


0.3897 0.6890 —0.0787 
Myonkries = | -02258 1.1834 oes 
0.0000 0.0000 1.0000 
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FIGURE 2.42 Relative response functions for the von Kries chromatic adaptation transform. 
(Reprinted from [36]]). 
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As the LMS cone space represents the response of the cones in the human visual 
system, it is a useful starting place for computational models of human vision. It 
is also a component in the iCAM color appearance model (see Section 7.3.3). The 
relative response as a function of wavelength is plotted in Figure 2.12. 

A newer cone response domain is given by the Bradford chromatic adaptation 
transform [64,76] (see Figure 2.13), as follows. 


0.8951 0.2664 —0.1614 
MbBradford = | -07502 1:7135 0.0367 | 
0.0389 —0.0685 1.0296 


2.9 WHITE POINT AND ILLUMINANTS 41 


Bradford relative response 


0.2 : : r : r 
400 450 500 550 600 650 700 
Wavelength (nm) 


FIGURE 2.43 Relative response functions for the Bradford chromatic adaptation transform. 
(Reprinted from [36].) 


M;! 0.4323 0.5184 0.0493 


0.9870 —0.1471 0.1600 
Bradford — | l 
—0.0085 0.0400 0.9685 


A third chromatic adaptation transform (see Figure 2.14) is used in the CIECAM02 
color appearance model (described in Section 2.8), as follows 


0.7328 0.4296 —0.1624 
Mcaro2 = | -07036 1.6975 0006 | 
0.0030 0.0136 0.9834 


1.0961 —0.2789 0.1827 
"ata =| 0.4544 0.4735 00721 | 
—0.0096 —0.0057 1.0153 
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CATO2 relative response 


2 
360 410 460 510 560 610 660 710 760 
Wavelength (nm) 


FIGURE 2.44 Relative response functions for the CATO2 chromatic adaptation transform. 
(Reprinted from Mark Fairchild’s slides.) 


These three chromatic adaptation transforms may be used to construct a matrix 
that will transform XYZ tristimulus values for a given white point to a new white 
point [126]. If the source white point is given as (Xs, Ys, Zs) and the destination 
white point as (Xp, Yp, Zp), their transformed values are 


Ps Xs 
Ps 5 Meat l J 
Bs Zs 
Pp Xp 
> = Mx] 3 ; 
Bp Zp 
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where Meat is one of the three chromatic adaptation matrices Myonkries, MBradford, 
or Mcaro2. A chromatic adaptation matrix for these specific white points may be 
constructed by concatenating the previously cited von Kries or Bradford matrices 
with a diagonal matrix that independently scales the three cone responses, as fol- 


lows. 
Pp/ ps 0 0 
M= Mae 0 Yo/Ys 0 Meat 
0 0  fod/bs 


Chromatically adapting an XYZ tristimulus value is now a matter of transforming it 
with matrix M, as follows. 


x’ X 
Y =M |Y 
Z' Z 


Here, (X’, Y’, Z’) is the CIE tristimulus value whose appearance under the target 
illuminant most closely matches the original XYZ tristimulus under the source illu- 
minant. 

Chromatic adaptation transforms are useful for preparing an image for display 
under different lighting conditions. Thus, if the scene were lit by daylight and an 
image of that scene viewed under tungsten lighting, a chromatic adaptation trans- 
form might be used to account for this difference. After applying the chromatic 
adaptation transform, the (X’, Y’, Z’) tristimulus values need to be converted to an 
RGB color space with a matrix that takes into account the white point of the dis- 
play environment. Thus, if the image is to be viewed under tungsten lighting, the 
XYZ-to-RGB transformation matrix should be constructed using the white point of 
a tungsten light source. 

As an example, Figure 2.15 shows an image lit with daylight approximating 
Des.° This figure shows the image prepared for several different viewing environ- 
ments. In each case, the CATO2 chromatic adaptation transform was used, and the 
conversion to RGB color space was achieved by constructing a conversion matrix 
with the appropriate white point. 


3 This image was taken in a conservatory in Rochester, New York, under cloud cover. The CIE Des standard light source 
was derived from measurements originally taken from similar daylight conditions in Rochester. 
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CATO2 chromatic adaptation. In reading order: original image, followed by five 


images chromatically adapted from Des to incandescent, tungsten, Dso, E, and F2. 
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(continued) Comparison of different chromatic adaptation transforms. In reading 
order: original image, followed by von Kries, Bradford, and CATO2 transforms. The final image is 
the chromatic adaptation transform applied directly in XYZ space. The transform is from Des to 
tungsten. 


The difference among the three different chromatic adaptation transforms is 
illustrated in Figure 2.16. Also shown in this figure is a chromatic adaptation per- 
formed directly in XYZ space, here termed XYZ scaling. The scene depicted here was 
created with only the outdoor lighting available and was taken in the same con- 
servatory as the images in Figure 2.15. Thus, the lighting in this scene would be 
reasonably well approximated with a Des luminant. Figure 2.16 shows transforms 
from Des to tungsten. 

The spectral sensitivities of the cones in the human visual system are broadband; 
that is, each of the red, green, and blue cone types (as well as the rods) are sensitive 
to a wide range of wavelengths, as indicated by their absorbance spectra (shown 
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Absorbance 
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IGURE Spectral absorbance spectra for the L, M, and S cones, as well as the rods (af- 
ter [16]). 


in Figure 2.17) [16]. As a result, there is significant overlap between the different 
cone types, although their peak sensitivities lie at different wavelengths. 

It is possible to construct new spectral response functions that are more narrow- 
band by computing a linear combination of the original response functions. The 
graphs of the resulting response functions look sharper, and the method is therefore 
called “spectral sharpening.” Within a chromaticity diagram, the three corners of the 
color gamut lie closer to the spectral locus, or even outside, and therefore the gamut 
is “wider” so that a greater range of visible colors can be represented. 

A second advantage of applying such a transform is that the resulting tristimu- 
lus values become more decorrelated. This has advantages in color constancy algo- 
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rithms; that is, algorithms that aim to recover surface reflectance from an image that 
has recorded the combined effect of surface reflectance and illuminance [7]. It also 
helps to reduce visible errors in color-rendering algorithms [208]. 


2.6 COLOR CORRECTION 


Without the camera response function (Section 4.6), one cannot linearize the in- 
put as needed for color correction. Thus, a color-correction value will not apply 
equally to all parts of the tone scale. For instance, darker colors may end up too 
blue compared to lighter colors. Furthermore, colors with primary values clamped 
to the upper limit (255 in an 8-bit image) have effectively been desaturated by 
the camera. Although users are accustomed to this effect in highlights, after color 
correction such desaturated colors may end up somewhere in the midtones, where 
desaturation is unexpected. In a naive method, whites may even be moved to some 
nonneutral value, which can be very disturbing. 

Figure 2.18 demonstrates the problem of color correction from an LDR original. 
If the user chooses one of the lighter patches for color balancing, the result may 
be incorrect due to clamping in its value. (The captured RGB values for the gray 
patches are shown in red.) Choosing a gray patch without clamping avoids this 
problem, but it is impossible to recover colors for the clamped patches. In particular, 
the lighter neutral patches end up turning pink in this example. The final image 
shows how these problems are avoided when an HDR original is available. Because 
the camera response curve has been eliminated along with clamping, the simple 


(a) A Macbeth ColorChecker chart captured with the appropriate white balance 
setting under an overcast sky; (b) the same scene captured using the “incandescent” white balance 
setting, resulting in a bluish color cast (red dots mark patches that cannot be corrected because one 
or more primaries are clamped to 255); (c) an attempt to balance white using the second gray 
patch, which was out of range in the original; (d) the best attempt at correction using the fourth 
gray patch, which was at least in range in the original; and (e) range issues disappear in an HDR 
original, allowing for proper post-correction. 
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(a) White balanced capture (b) Off-balance capture 


(c) LDR white balancing using second patch (d) LDR white balancing using fourth patch 


(e) HDR white balancing 
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approach of balancing colors by choosing a neutral patch and multiplying the image 
by its inverse works quite well. 


2.7 COLOR OPPONENT SPACES 


With a 3-by-3 matrix, pixel data may be rotated into different variants of RGB color 
spaces to account for different primaries. A feature shared by all RGB color spaces is 
that for natural images correlations exist between the values in each color channel. 
In other words, if a pixel of a natural image has a large value for the red component, 
the probability of also finding a large value for the green and blue components is 
high. Thus, the three channels are highly correlated. 

An example image is shown in Figure 2.19. A set of randomly selected pixels is 
plotted three times in the same figure, where the axes of the plot are R-G, R-B, and 
G-B. This plot shows a point cloud of pixel data at an angle of about 45 degrees, no 
matter which channel is plotted against which. Thus, for this natural image strong 
correlations exist between the channels in RGB color space. 

This means that the amount of information carried by the three values compris- 
ing a pixel is less than three times the amount of information carried by each of 
the values. Thus, each color pixel carries some unquantified amount of redundant 
information. 

The human visual system deals with a similar problem. The information captured 
by the photoreceptors needs to be transmitted to the brain through the optic nerve. 
The amount of information that can pass through the optic nerve is limited and 
constitutes a bottleneck. In particular, the number of photoreceptors in the retina is 
far larger than the number of nerve endings that connect the eye to the brain. 

After light is absorbed by the photoreceptors, a significant amount of processing 
occurs in the next several layers of cells before the signal leaves the eye. One type of 
processing is a color space transformation to a color opponent space. Such a color space 
is characterized by three channels; a luminance channel, a red-green channel, and 
a yellow-blue channel. 

The luminance channel ranges from dark to light and bears resemblance to the 
Y channel in CIE XYZ color space. The red-green channel ranges from red to green 
via neutral gray. The yellow-blue channel encodes the amount of blue versus the 
amount of yellow in a similar way to the red-green channel (Figure 2.20). This 
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RGB scatter plot 
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FIGURE 2.19 Scatter plot of RGB pixels randomly selected from the image at the top. 
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IGURE Original image (top left) split into a luminance channel (top right), a yel- 
low-blue channel (bottom left), and a red-green channel (bottom right). For the purpose of visual- 
ization, the images depicting the yellow-blue and red-green channels are shown with the luminance 


component present. 


encoding of chromatic information is the reason humans are able to describe colors 
as reddish yellow (orange) or greenish blue. However, colors such as bluish yellow 
and reddish green are never described because of this encoding (see [92]). 

It is possible to analyze sets of natural images by means of principal components 
analysis (PCA) [110]. This technique rotates multidimensional data such that the 
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Scatter plot of La pixels randomly selected from the image of Figure 2.19. 


axes align with the data as well as possible. Thus, the most important axis aligns 
with the direction in space that shows the largest variation of data points. This is the 
first principal component. The second principal component describes the direction 
accounting for the second greatest variation in the data. This rotation therefore 
decorrelates the data. 

If the technique is applied to images encoded in LMS color space (i.e., images 
represented in a format as thought to be output by the photoreceptors), a new set 
of decorrelated axes is produced. The surprising result is that the application of PCA 
to a set of natural images produces a color space that is closely matched to the color 
opponent space the human visual system employs [110]. 

A scatter plot of the image of Figure 2.19 in a color opponent space (Laß, 
discussed later in this section) is shown in Figure 2.21. Here, the point clouds 
are reasonably well aligned with one of the axes, indicating that the data is now 
decorrelated. The elongated shape of the point clouds indicates the ordering of the 
principal axes, luminance being most important and therefore most elongated. 


54 CHAPTER O2. LIGHT AND COLOR 


The decorrelation of data may be important, for instance, for color-correction 
algorithms. What would otherwise be a complicated 3D problem may be cast into 
three simpler 1D problems by solving the problem in a color opponent space [107]. 

At the same time, the first principal component (the luminance channel) ac- 
counts for the greatest amount of variation, whereas the two chromatic color oppo- 
nent channels carry less information. Converting an image into a color space with 
a luminance channel and two chromatic channels thus presents an opportunity to 
compress data because the latter channels would not require the same number of 
bits as the luminance channel to accurately represent the image. The color opponent 
space Læg that results from applying PCA to natural images may be approximated 
by the following matrix transform, which converts between LaB and LMS (see 
Section 2.5). 


This color space has proved useful in algorithms such as the transfer of color be- 
tween images, where the colors are borrowed from one image and applied to a 
second image [107]. This algorithm computes means and standard deviations for 
each channel separately in both source and target images. Then, the pixel data in the 
target image are shifted and scaled such that the same mean and standard deviation 
as the source image are obtained. Applications of color transfer include the work of 
colorists, compositing, and matching rendered imagery with live video footage in 
mixed-reality applications. 
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In addition, human sensitivity to chromatic variations is lower than to changes 
in luminance. Chromatic channels may therefore be represented at a lower spa- 
tial resolution than the luminance channel. This feature may be exploited in image 
encodings by sampling the image at a lower resolution for the color opponent chan- 
nels than for the luminance channel. This is demonstrated in Figure 2.22, where the 


The red-green and yellow-blue channels are reduced in spatial resolution by a 
factor of 1, 2, 4, 8, 16, and 32. 
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full resolution image is shown on the left. The spatial resolution of the red-green 
and yellow-blue channels is reduced by a factor of two for each subsequent image. 
In Figure 2.23, the luminance channel was also reduced by a factor of two. The 
artifacts in Figure 2.22 are much more benign than those in Figure 2.23. 


All three channels in Læfß space are reduced in spatial resolution by a factor of 
1, 2, 4, 8, 16, and 32. 
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Subsampling of chromatic channels is used, for instance, in the YCgCr encoding 
that is part of the JPEG file format and part of various broadcast standards, including 
HDTV [100]. Conversion from RGB to YCgCp and back as used for JPEG is given by 


Y 0.299 0.587 0.114 R 
cs is —0.333 0498 fe] 
CR 


0.498 —0.417 —0.081 B 
R 1.000 0.000 1.397 Y 
fe] E: —0.343 071 | cs ; 
B 
This conversion is based on ITU-R BT.601 [100]. Other color spaces which have 


1.000 1.765 0.000 CR 
one luminance channel and two chromatic channels, such as CIELUV and CIELAB, 


are discussed in the following section. 


2.8 COLOR APPEARANCE 


The human visual system adapts to the environment it is viewing (see Chapter 6 for 
more information). Observing a scene directly therefore generally creates a different 
visual sensation than observing an image of that scene on a (LDR) display. In the 
case of viewing a scene directly, the observer will be adapted to the scene. When 
looking at an image of a display, the observer will be adapted to the light emitted 
from the display, as well as to the environment in which the observer is located. 

There may therefore be a significant mismatch between the state of adaptation 
of the observer in these two cases. This mismatch may cause the displayed image 
to be perceived differently from the actual scene. The higher the dynamic range 
of the scene the larger this difference may be. In HDR imaging, and in particular 
tone reproduction, it is therefore important to understand how human vision adapts 
to various lighting conditions and to develop models that predict how colors will 
be perceived under such different lighting conditions. This is the domain of color 
appearance modeling [27]. 

A color’s appearance is influenced by various aspects of the viewing environ- 
ment, such as the illuminant under which the stimulus is viewed. The chromatic 
adaptation transforms discussed in Section 2.5 are an important component of most 
color appearance models. 
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Simultaneous color contrast shown for an identical gray patch displayed on dif- 
ferently colored backgrounds. 


The color of the area surrounding the stimulus also plays an important role, 
as demonstrated in Figure 2.24, where the same gray patch is shown on different 
backgrounds. The color of the patch will appear different in each case — an effect 
known as simultaneous color contrast. 

To characterize a stimulus within a specific environment, first its tristimulus 
value is specified in CIE XYZ color space. Second, attributes of the environment 
in which the stimulus is viewed need to be provided. If the stimulus is a homoge- 
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neous reflecting patch of color on a neutral (gray) background, this characterization 
of the environment may be as simple as the specification of an illuminant. 

The appearance of a color is then described by “appearance correlates” that may 
be computed from the color’s tristimulus values as well as the description of the 
environment. Useful appearance correlates include lightness, chroma, hue, and sat- 
uration, which are defined later in this section. 

Appearance correlates are not computed directly in XYZ color space, but require 
an intermediate color space such as the CIE 1976 L*u*v* or CIE 1976 L*a*b* 
color spaces. The names of these color spaces may be abbreviated as CIELUV and 
CIELAB, respectively. 

For both of these color spaces it is assumed that a stimulus (X, Y, Z) is formed 
by a white reflecting surface that is lit by a known illuminant with tristimulus values 
(Xa, Yn, Zn). The conversion from CIE 1931 tristimulus values to CIELUV is then 
given by the following. 


yN 
Ls = 116(7) — 16 
Yn 
u* = 13L*(u' —u,) 
v* = 13L*(v' — v’) 
This conversion is under the constraint that Y/Y, > 0.008856. For ratios smaller 
than 0.008856, L*, is applied as follows. 
% Y 
Ly, = 903.3 
Yn 


The primed quantities in these equations are computed from (X, Y, Z) as follows. 


j 4X j 4Xn 
u >= a _ u = .—————— 
X+15Y+3Z 2 X, + 15¥,+3Z, 

, 9Y ‘ 9Y, 


Y= X+I5Y4+3Z TXY +Z 


This transformation creates a more or less uniform color space, such that equal 
distances anywhere within this space encode equal perceived color differences. It is 
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FIGURE 2.25 CIE (u’, v’) chromaticity diagram showing the range of colors humans can 
distinguish. 


therefore possible to measure the difference between two stimuli (L}, uj, vj) and 
(L3, už, v3) by encoding them in CIELUV space, and applying the color difference 
formula 


AE*, =[(AL*)? + (Au*)? + (Av*)?]!”, 


where AL* = Li — L3, etc. 

In addition, u’ and v’ may be plotted on separate axes to form a chromaticity 
diagram, as shown in Figure 2.25. Equal distances in this diagram represent approx- 
imately equal perceptual differences. For this reason, in the remainder of this book 
CIE (u’, v’) chromaticity diagrams are shown rather than perceptually nonuniform 
CIE (x, y) chromaticity diagrams. The CIELAB color space follows a similar ap- 
proach. For the ratios X/Xy, Y/Y, and Z/Z,, each being larger than 0.008856, 
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the color space is defined by 


E-O 
ORO 


If any ratio is smaller than 0.008856, the modified quantities L*,, až, and b% may 
be computed as follows. 


. Y Y 
L* = 903.3 — for — < 0.008856 
Y, A 


afi) —1(2) 
bt = 200] (7) z s(a) 


The function f(.) takes a ratio as argument in the previous equations. If either of 
these ratios is denoted as r, f (r) is defined as 


(r)! for r > 0.008856 


r= 16 
fo) 7.787r + — forr < 0.008856. 
116 
Within this color space, which is also approximately perceptually linear, the differ- 
ence between two stimuli may be quantified with the following color difference 
formula. 


AE%, = [(AL*)? + (Aa*)? + (Abt)? ] 
The reason for the existence of both of these color spaces is largely historical. Both 


color spaces are in use today, with CIELUV more common in the television and 
display industries and CIELAB in the printing and materials industries [125]. 
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Although CIELUV and CIELAB by themselves are perceptually uniform color 
spaces, they may also form the basis for color appearance models. The percep- 
tion of a set of tristimulus values may be characterized by computing appearance 
correlates [27]. Our definitions are based on Wyszecki and Stiles’ book Color Science 
[149]. 


Brightness: The attribute of visual sensation according to which a visual stimu- 
lus appears to emit more or less light is called brightness, which ranges from 
bright to dim. 


Lightness: The area in which a visual stimulus is presented may appear to emit 
more or less light in proportion to a similarly illuminated area that is per- 
ceived as a white stimulus. Lightness is therefore a relative measure and may 
be seen as relative brightness. Lightness ranges from light to dark. In both 
CIELUV and CIELAB color spaces, L* is the correlate for lightness. Note that if 
the luminance value of the stimulus is about 18% of Y, (i.e., Y/ Yn = 0.18), 
the correlate for lightness becomes about 50, which is halfway on the scale 
between light and dark. In other words, surfaces with 18% reflectance appear 
as middle gray. In photography, 18% gray cards are often used as calibration 
targets for this reason.* 


Hue: The attribute of color perception denoted by red, green, blue, yellow, 
purple, and so on is called hue. A chromatic color is perceived as possessing 
hue. An achromatic color is not perceived as possessing hue. Hue angles Auv 
and hab may be computed as follows. 


* 
hay = arctan — 
u* 


* 
hab = arctan m 


Chroma: A visual stimulus may be judged in terms of its difference with an 
achromatic stimulus with the same brightness. This attribute of visual sen- 


4 Although tradition is maintained and 18% gray cards continue to be used, the average scene reflectance is often closer 
to 13%. 
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sation is called chroma. Correlates of chroma may be computed in both 
CIELUV (C*,) and CIELAB (C*,) as follows. 


Ghd = [u + wr] 
i = [e +o]? 


Saturation: Whereas chroma pertains to stimuli of equal brightness, saturation 
is an attribute of visual sensation which allows the difference of a visual stim- 
ulus and an achromatic stimulus to be judged regardless of any differences 
in brightness. In CIELUV, a correlate for saturation să, may be computed as 
follows. 
* 
* Civ 
Suy = L* 
A similar correlate for saturation is not available in CIELAB. 


Several more color appearance models have recently appeared. The most notable 
among these are CIECAM97 [12,28,54,85], which exists in both full and simplified 
versions, and CIECAM02 [74,84]. As with the color spaces mentioned previously, 
their use is in predicting the appearance of stimuli placed in a simplified environ- 
ment. They also allow conversion of stimuli between different display media, such 
as different computer displays that may be located in different lighting environ- 
ments. These recent color appearance models are generally more complicated than 
the procedures described in this section, but are also deemed more accurate. 

The CIECAM97 and CIECAM02 color appearance models, as well as several of 
their predecessors, follow a general structure but differ in their details. We outline 
this structure using the CIECAM02 model as an example [74,84]. 

This model works under the assumption that a target patch with given relative 
tristimulus value XY Z is viewed on a neutral background and in the presence of 
a white reflective patch, which acts as the reference white (i.e., it is the brightest 
part of the environment under consideration). The background is again a field of 
limited size. The remainder of the visual field is taken up by the surround. This 
simple environment is lit by an illuminant with given relative tristimulus values 
XwYwZw. Both of these relative tristimulus values are specified as input and are 
normalized between 0 and 100. 
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Surround F c Ne 
Average 1.0 0.69 1.0 
Dim 0.9 0.59 0.95 
Dark 0:8 0.525 0:8 


TABLE 2.5 Values for intermediary parameters in the CIECAMO2 model as a func- 
tion of the surround description. 


The luminance measured from the reference white patch is then assumed to be 
the adapting field luminance La — the only absolute input parameter, measured 
in cd/m?. The neutral gray background has a luminance less than or equal to the 
adapting field luminance. It is denoted Y, and is specified as a fraction of La, also 
normalized between 0 and 100. 

The final input to the CIECAM02 color appearance model is a classifier describing 
the surround as average, dim, or dark. This viewing condition parameter is used to 
select values for the intermediary parameters F, c, and Ne according to Table 2.5. 
Further intermediary parameters n, Nbb, Nob, and z are computed from the input 
as follows. 


Yp 
n = — 
Yw 
1\02 
No = 0.125( =) 
n 
Nbb = Neb 


z = 1.48 + y/n 
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Next, a factor Fi, is computed from the adapting field luminance, which accounts 
for the partial adaptation to overall light levels. This takes the following form. 


1 
k= 
5Lat+1 


F, = 0.2k4(5L,) +0.1(1 — k4)?(5L,)!7 (2.1) 


The CIECAMO2 color appearance model, and related models, proceed with the 
following three main steps. 


e Chromatic adaptation 
e Nonlinear response compression 
e Computation of perceptual appearance correlates 


The chromatic adaptation transform is performed in the CATO2 space, outlined in 
Section 2.5. The XY Z and XwYw Zw tristimulus values are first converted to this 


R X 
G = Meato2 [z] 
B Z 


Then a degree of adaptation D is computed, which determines how complete the 
adaptation is. It is a function of the adapting field luminance as well as the surround 
(through the parameters La and F). This takes the following form. 


1 =L, — 42 
D=F|1 exp 
3.6 92 


The chromatically adapted signals are then computed, as follows. 


r= r| (p Zt) +0 -D)] 

C= (>=) (1=D) 

a | Gw Taa | 

B=B| (0 ©) +0- 0] 
Bw 


space, as follows. 
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After applying this chromatic adaptation transform, the result is converted back to 
XYZ space. 

The second step of the CIECAM02 model is the nonlinear response compression, 
which is carried out in the Hunt—Pointer—Estevez color space, which is close to a 
cone fundamental space such as LMS (see Section 2.5). Conversion from XYZ to this 
color space is governed by the following matrix. 


—0.2298 1.1834 0.0464 


0.3897 0.6890 —0.0787 
0.0000 0.0000 1.0000 


The chromatically adapted signal after conversion to the Hunt—Pointer—Estevez color 
space is indicated with the (R’G'B’) triplet. The nonlinear response compression 
yields a compressed signal (R! G/ B!), as follows. 


,  _ 400(F,R’/100)°4 ii 
a 27.13 + (FL R'/100)042 ` 
, _  400(F1G'/100)°4 ve 
27.13 + (F,B’/100)9-42 © 
400(F,B’/100)°-42 
B! = ( L / ) +0.1 


"27.13 + (F,B’/100)9-42 


This response compression function follows an S shape on a log-log plot, as shown 
in Figure 2.26. 

The final step consists of computing perceptual appearance correlates. These de- 
scribe the perception of the patch in its environment, and include lightness, bright- 
ness, hue, chroma, colorfulness, and saturation. First a set of intermediary parame- 
ters is computed, as follows, which includes a set of color opponent signals a and b, 
a magnitude parameter ft, an achromatic response A, hue angle h, and eccentricity 
factor e. 


a= R! —12G/114+ B!/11 
b= (R! + G — 2B!)/9 
h = tan™! (b/a) 
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Log R' 


© CIECAMO2 nonlinear response compression on log-log axes. 
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A = [2R! + G!, + B! /20 — 0.305]N»» 


The unique hues red, yellow, green, and blue have values for h as given in Ta- 
ble 2.6. The hue angles hı and h2 for the two nearest unique hues are determined 
from the value of h and Table 2.6. Similarly, eccentricity factors e; and e2 are de- 
rived from this table and the value of e. The hue composition term H; of the next 
lower unique hue is also read from this table. The appearance correlates may then be 
computed with the following equations, which are estimates for hue H, lightness 
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Unique Hue Hue Angle Eccentricity Factor Hue Composition 


Red 20.14 0.8 (0) 
Yellow 90.00 0.7 100 
Green 164.25 1.0 200 
Blue 237.53 1.2 300 


TABLE 2.6 Hue angles h, eccentricity factors e, and hue composition H; for the 
unique hues red, yellow, green, and blue. 


J, brightness Q, chroma C, colorfulness M, and saturation s. 


100(h — h1) /e1 
(h —hy)/e, + (h2 —h)/en 
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J= 100( =} 
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These appearance correlates thus describe the tristimulus value XY Z in the context 
of its environment. Thus, by changing the environment only the perception of this 
patch will change and this will be reflected in the values found for these appearance 
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correlates. In practice, this would occur, for instance, when an image displayed on 
a monitor and printed on a printer needs to appear the same. Although colorime- 
try may account for the different primaries of the two devices, color appearance 
modeling additionally predicts differences in color perception due to the state of 
adaptation of the human observer in both viewing conditions. 

If source and target viewing conditions are known, color appearance models 
may be used to convert a tristimulus value from one viewing condition to the 
other. The first two steps of the model (chromatic adaptation and nonlinear re- 
sponse compression) would then be applied, followed by the inverse of these two 
steps. During execution of the inverse model, the parameters describing the target 
environment (adapting field luminance, tristimulus value of the reference white, 
and so on) would be substituted into the model. 

The field of color appearance modeling is currently dominated by two trends. 
The first is that there is a realization that the visual environment in which a stimulus 
is observed is in practice much more complicated than a uniform field with a given 
luminance. In particular, recent models are aimed at modeling the appearance of a 
pixel’s tristimulus values in the presence of neighboring pixels in an image. Exam- 
ples of models that begin to address these spatial configurations are the S-CIELAB 
and iCAM models [29,30,61,86,151]. 

A second trend in color appearance modeling constitutes a novel interest in ap- 
plying color appearance models to HDR data. In particular, there is a mismatch in 
adaptation of the human visual system in a typical scene involving high contrast 
ratios and a human observer in front of a typical display device. Thus, if an accurate 
HDR capture of a scene is tone mapped and displayed on a computer monitor, the 
state of adaptation of the human observer in the latter case may cause the scene to 
appear different from the original scene. 

The iCAM “image appearance model,” derived from CIECAM02, is specifically 
aimed at addressing these issues [29,61], and in fact may be seen as a tone- 
reproduction operator. This model is presented in detail in Chapter 8. 


2.9 DISPLAY GAMMA 


Cathode ray tubes have a nonlinear relationship between input voltage V and 
light output Ly. This relationship is well approximated with the following power 
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law function. 
Ly=kv” 


The exponent y models the nonlinearity introduced by the specific operation of the 
CRT, and is different for different monitors. If V is normalized between 0 and 1, 
the constant k simply becomes the maximum output luminance Lmax- 

In practice, typical monitors have a gamma value between 2.4 and 2.8. How- 
ever, further nonlinearities may be introduced by the lookup tables used to con- 
vert values into voltages. For instance, Macintosh computers have a default gamma 
of about 1.8, which is achieved by the interaction of a system lookup table with 
the attached display device. Whereas the Macintosh display system may have a 
gamma of 1.8, the monitor attached to a Macintosh will still have a gamma closer 
to 2.5 [100]. 

Thus, starting with a linear set of values that are sent to a CRT display, the result is 
a nonlinear set of luminance values. For the luminances produced by the monitor to 
be linear, the gamma of the display system needs to be taken into account. To undo 
the effect of gamma, the image data needs to be gamma corrected before sending it 
to the display, as explained in material following. 

Before the gamma value of the display can be measured, the black level needs 
to be set appropriately [100]. To set the black point on a monitor, you first display 
a predominantly black image and adjust the brightness control on the monitor to 
its minimum. You then increase its value until the black image just starts to deviate 
from black. The contrast control may then be used to maximize the amount of 
contrast. 

The gamma value of a display device may then be estimated, as in the image 
shown in Figure 2.27. Based on an original idea by Paul Haeberli, this figure consists 
of alternating black and white lines on one side and solid gray patches on the other. 
By viewing this chart from a reasonable distance and matching the solid gray that 
comes closest to the gray formed by fusing the alternating black and white lines, 
the gamma value for the display device may be read from the chart. Note that this 
gamma estimation chart should only be used for displays that follow a power-law 
transfer function, such as CRT monitors. This gamma estimation technique may not 
work for LCD displays, which do not follow a simple power law. 

Once the gamma value for the display is known, images may be pre-corrected 
before sending them to the display device. This is achieved by applying the follow- 
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0.6 0.8 1.0 12 14 16 18 20 2E 


Gamma estimation for CRT displays. The alternating black and white lines 
should be matched to the solid grays to determine the gamma of a display device. 


ing correction to the values in the image, which should contain normalized values 
between 0 and 1. 


R'= RY 
G' =G 
B'= B 


An image corrected with different gamma values is shown in Figure 2.28. 

The technology employed in LCD display devices is fundamentally different from 
CRT displays, and the transfer function for such devices is often very different. How- 
ever, many LCD display devices incorporate circuitry to mimic the transfer function 
of a CRT display device. This provides some backward compatibility. Thus, although 
gamma encoding is specifically aimed at correcting for the nonlinear transfer func- 
tion of CRT devices, often (but not always) gamma correction may be applied to 
images prior to display on LCD. 

Many display programs perform incomplete gamma correction (i.e., the image 
is corrected such that the displayed material is intentionally left nonlinear). Often, 
a gamma value of 2.2 is used. The effect of incomplete gamma correction is that 
contrast is boosted, which viewers tend to prefer [29]. In addition, display devices 
reflect some of their environment, which reduces contrast. Partial gamma correction 
may help regain some of this loss of contrast [145]. 
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FIGURE 2.28 An image corrected with different gamma values. In reading order: y = 1.0, 
1.5, 2.0, and 2.5. (Image courtesy of the Albin Polasek museum, Winter Park, Florida.) 
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One of the main advantages of using gamma encoding is that it reduces visible 
noise and quantization artifacts by mimicking the human contrast sensitivity curve. 
However, gamma correction and gamma encoding are separate issues, as explained 
next. 


2.10 BRIGHTNESS ENCODING 


Digital color encoding requires quantization, and errors are inevitable during this 
process. In the case of a quantized color space, it is preferable for reasons of per- 
ceptual uniformity to establish a nonlinear relationship between color values and 
the intensity or luminance. The goal is to keep errors below the visible threshold as 
much as possible. 

The eye has a nonlinear response to brightness. That is, at most adaptation levels, 
brightness is perceived roughly as the cube root of intensity (see, for instance, the 
encoding of L> of the CIELAB and CIELUV color spaces in Section 2.8). Applying a 
linear quantization of color values would yield more visible steps in darker regions 
than in the brighter regions, as shown in Figure 2.29.5 A power-law encoding 
with a y value of 2.2 produces a much more even distribution of quantization 
steps, although the behavior near black is still not ideal. For this reason and others, 
some encodings (such as sRGB) add a short linear range of values near zero (see 
Section 2.11). 

However, such encodings may not be efficient when luminance values range over 
several thousand or even a million to one. Simply adding bits to a gamma encoding 
does not result in a good distribution of steps, because it can no longer be assumed 
that the viewer is adapted to a particular luminance level, and the relative quantiza- 
tion error continues to increase as the luminance gets smaller. A gamma encoding 
does not hold enough information at the low end to allow exposure readjustment 
without introducing visible quantization artifacts. 

To encompass a large range of values when the adaptation luminance is un- 
known, an encoding with a constant or nearly constant relative error is required. 
A log encoding quantizes values using the following formula rather than the power 


5 We have chosen a quantization to 6 bits to emphasize the visible steps. 


74 CHAPTER O2. LIGHT AND COLOR 


i — Linear 
— y22 
0.0 


T T 
-0 10 20 30 40 50 60 


Perceived brightness 


s 
Quantized value P V 
Vv) 
i 
FIGURE 2.95 Perception of quantization steps using a linear and a gamma encoding. Only 6 
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bits are used in this example encoding to make the banding more apparent, but the same effect takes 
place in smaller steps using 8 bits per primary. 


law cited earlier. 


T v 
Tout = | ma | 
Imin 
This formula assumes that the encoded value v is normalized between 0 and 1, and 


is quantized in uniform steps over this range. Adjacent values in this encoding thus 
differ by a constant factor equal to 
1/N 
[=] á 
Imin i 
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where N is the number of steps in the quantization. This is in contrast to a gamma 
encoding, whose relative step size varies over its range, tending toward infinity at 
zero. The advantage of constant steps is offset by a minimum representable value, 
Imin, in addition to the maximum intensity we had before. 

Another alternative closely related to the log encoding is a separate exponent and 
mantissa representation, better known as floating point. Floating-point representa- 
tions do not have perfectly equal step sizes but follow a slight sawtooth pattern in 
their error envelope, as shown in Figure 2.30. To illustrate the quantization dif- 


Relative error (%) 


Log encoded value log10 (v) 


FIGURE 2.30 Relative error percentage plotted against logi of image value for three encoding 
methods. 
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ferences between gamma, log, and floating-point encodings, a bit size (12) and 
range (0.001 to 100) are chosen that can be reasonably covered by all three types. 
A floating-point representation with 4 bits in the exponent, 8 bits in the mantissa, 
and no sign bit is chosen, because only positive values are required to represent 
light. 

By denormalizing the mantissa at the bottom end of the range, values between 
Imin and zero may also be represented in a linear fashion, as shown in this fig- 
ure. By comparison, the error envelope of the log encoding is constant over the 
full range, whereas the gamma encoding error increases dramatically after just two 
orders of magnitude. Using a larger constant for y helps this situation somewhat, 
but ultimately gamma encodings are not well suited to full HDR imagery where the 
input and/or output ranges are unknown. 


2.11 STANDARD RGB COLOR SPACES 


Most capture and display devices have their own native color space, generically 
referred to as device-dependent RGB. Although it is entirely possible to convert an 
image between two device-dependent color spaces, it is more convenient to define 
a single standard color space that can serve as an intermediary between device- 
dependent color spaces. 

On the positive side, such standards are now available. On the negative side, there 
is not one single standard but several competing standards. Most image encodings 
fall into a class called output-referred standards, meaning that they employ a color space 
corresponding to a particular output device rather than to the original scene they 
are meant to represent. The advantage of such a standard is that it does not require 
any manipulation prior to display on a targeted device, and it does not “waste” 
resources on colors that are out of this device gamut. Conversely, the disadvantage 
of such a standard is that it cannot represent colors that may be displayable on other 
output devices or that may be useful in image processing operations along the way. 

A scene-referred standard follows a different philosophy, which is to represent the 
original captured scene values as closely as possible. Display on a particular output 


6 Floating-point denormalization refers to the linear representation of values whose exponent is at the minimum. The 
mantissa is allowed to have a zero leading bit, which is otherwise assumed to be 1 for normalized values, and this leads 
to a steady increase in relative error at the very bottom end, rather than an abrupt cutoff. 
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device then requires some method of mapping the pixels to the device’s gamut. This 
operation is referred to as tone mapping, which may be as simple as clamping RGB 
values to a 0-to-1 range or something more sophisticated, such as compressing the 
dynamic range or simulating human visual abilities and disabilities (see Chapters 
6 through 8). The chief advantage gained by moving tone mapping to the image 
decoding and display stage is that correct output can be produced for any display 
device, now and in the future. In addition, there is the freedom to apply complex 
image operations without suffering losses due to a presumed range of values. 

The challenge of encoding a scene-referred standard is finding an efficient rep- 
resentation that covers the full range of color values. This is precisely where HDR 
image encodings come into play, as discussed in Chapter 3. 

For reference, we discuss several current output referenced standards. In Sec- 
tion 2.4, we already introduced the ITU-R RGB color space. In the remainder of this 
section conversions to several other color spaces are introduced. Such conversions 
all follow a matrix multiplication followed by a nonlinear encoding. The sRGB color 
space is introduced as an example, before generalizing the concept to other color 
spaces. 

The nonlinear sRGB color space is based on a virtual display. It is a standard 
specified by the International Electrotechnical Commission (IEC 61966-2-1). The 
primaries as well as the white point are specified in terms of xy chromaticities 
according to Table 2.7 (this table also shows information for other color spaces, 
discussed in material following). The maximum luminance for white is specified as 
80 cd/m’. 

Because the specification of sRGB is with respect to a virtual monitor, it includes 
a nonlinearity similar to gamma correction. This makes sRGB suitable for Internet 
applications as well as scanner-to-printer applications. Many digital cameras now 
produce images in sRGB space. Because this color space already includes a nonlinear 
transfer function, images produced by such cameras may be displayed directly on 
typical monitors. There is generally no further need for gamma correction, except 
perhaps in critical viewing applications. 

The conversion of CIE XYZ tristimulus values to sRGB consists of a 3-by-3 ma- 
trix multiplication followed by a nonlinear transfer function. The linear part of the 
transform is identical to the matrix specified in ITU-R BT.709, introduced in Sec- 
tion 2.4. The resulting RGB values are converted into sRGB using the following 
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White Point 
Color Space R G B (Illuminant) 
Adobe RGB (1998) x 0.6400 0.2100 0.1500 D65 0.3127 
y 0.3300 0.7100 0.0600 0.3290 
sRGB x 0.6400 0.3000 0.1500 D65 0.3127 
y 0.3300 0.6000 0.0600 0.3290 
HDTV (HD-CIF) x 0.6400 0.3000 0.1500 D65 0.3127 
y 0.3300 0.6000 0.0600 0.3290 
NTSC (1953) x 0.6700 0.2100 0.1400 C 0.3101 
y 0.3300 0.7100 0.0800 0.3161 
SMPTE-C x 0.6300 0.3100 0.1550 D65 0.3127 
y 0.3400 0.5950 0.0700 0.3290 
PAL/SECAM x 0.6400 0.2900 0.1500 D65 0.3127 
y 0.3300 0.6000 0.0600 0.3290 
Wide gamut x 0.7347 0.1152 0.1566 D50 0.3457 
y 0.2653 0.8264 0.0177 0.3584 


TABLE 2.7 Chromaticity coordinates for primaries and white points defining several 
RGB color spaces. 


transfer function (for R, G, and B > 0.0031308). 


Ragar = 1.055R!/24 — 0.055 
Grag = 1.0556"7" — 0.055 
Bary = 1.055B 1/4 = 0.055 
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For values smaller than 0.0031308, a linear function is specified, as follows. 


Reacp = 12.92 R 
Grag = 12.92G 
Beacon = 12.92 B 


This conversion follows a general pattern that is found in other standards. First, a 
3-by-3 matrix is defined, which transforms from XYZ to a color space with different 
primaries. Then a nonlinear transform is applied to the tristimulus values. This 
transform takes the following general form [93]. 


S aa fort<R<1 


sR for 0<R<t 
a. a frt LG<1 
sG fr0<G<t 
aa a alee fort<B<l 
sB fr0<B<t 


Note that the conversion is linear in a small dark region, and follows a gamma curve 
for the remainder of the range. The value of s determines the slope of the linear 
segment, and f is a small offset. Table 2.8 lists several RGB standards, which are 
defined by their conversion matrices as well as their nonlinear transform specified 
by the y, f, s, and t parameters [93]. The primaries and white points for each 
color space are outlined in Table 2.7. The gamuts spanned by each color space are 
shown in Figure 2.31. The gamut for the HDTV color space is identical to the sRGB 
standard and is therefore not shown again. 

The Adobe RGB color space was formerly known as SMPTE-240M, but was re- 
named after SMPTE’s gamut was reduced. It has a larger gamut than sRGB, as shown 
in the chromaticity diagrams of Figure 2.31. This color space was developed with 
the printing industry in mind. Many digital cameras provide an option to output 
images in Adobe RGB color, as well as sRGB. 

The HDTV and sRGB standards specify identical primaries, but differ in their 
definition of viewing conditions. As such, the difference lies in the nonlinear trans- 
form. 
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The National Television System Committee (NTSC) standard was used as the 
color space for TV in North America. It has now been replaced with SMPTE-C to 
match phosphors in current display devices, which are more efficient and brighter. 
Phase Alternating Line (PAL) and Systeme Flectronique Couleur Avec Memoire 
(SECAM) are the standards used for television in Europe. 

Finally, the Wide gamut color space is shown for comparison [93]. Its primaries 
are monochromatic light sources with wavelengths of 450, 525, and 700 nm. This 
color space is much closer to the spectrally sharpened chromatic adaptation trans- 
forms discussed in Section 2.5. 


HDR Image Encodings 


An important consideration for any digi- 
tal image is how to store it. This is espe- 
cially true for HDR images, which record 
a much wider gamut than standard 24-bit 
RGB and therefore require an efficient en- 
coding to avoid taking an excess of disk 
space and network bandwidth. Fortunately, 
several HDR file encodings and formats 
have already been developed by the graph- 
ics community. A few of these formats have been in use for decades, whereas others 
have just recently been introduced to the public. Our discussion includes these ex- 
isting encodings, as well as encodings that lie on the horizon. 

An encoding is defined as the raw bit representation of a pixel value, whereas a 
format includes whatever wrapper goes around these pixels to compose a complete 
image. The quality of the results is largely determined by the encoding, rather than 
the format, making encodings the focus of this chapter. File formats that include 
some type of “lossy” compression are the exceptions to this rule, and must be 
considered and evaluated as a whole. Lossy HDR formats are only starting to appear 
at the time of writing, making any comparisons premature. We simply introduce 
the basic concepts. 


3.1 LDR VERSUS HDR ENCODINGS 


There is more than bit depth to defining the difference between HDR and LDR 
encodings. Specifically, a 24-bit RGB image is usually classified as an output-referred 
standard, because its colors are associated with some target output device. In con- 
trast, most HDR images are scene-referred, in that their pixels have a direct relation to 
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radiance in some scene, either real or virtual. This is logical, because most output 
devices are low in dynamic range, whereas most scenes are high in dynamic range. 
One cannot refer a color encoding to scene values if those values are beyond what 
can be represented, and thus LDR images are inappropriate for scene-referred data. 
On the other hand, an HDR encoding could be used to hold output-referred data, but 
there would be little sense in it because scene-referred data can always be mapped to 
a particular output device, but not the reverse. A scene-referred to output-referred 
transformation is a one-way street for the simple reason that no output device can 
reproduce all we see in the real world. This transformation is called tone mapping, 
a topic we return to frequently in this book. (See Chapters 6 through 8.) 

Having just introduced this standard term, it is important to realize that scene- 
referred is really a misnomer, because no image format ever attempts to record all 
of the light projected from a scene. In most cases, there would be little sense in 
recording infrared and ultraviolet wavelengths, or even completely sampling the 
visible spectrum, because the eye is trichromatic. As explained in Chapter 2, this 
means that it is sufficient to record three color channels in order to reproduce every 
color visible to a human observer. These may be defined by the CIE XYZ tristim- 
ulus space or any equivalent three-primary space (e.g., RGB, YCgCr, CIELUV, and 
so on). Because we are really interested in what people see, as opposed to what 
is available, it would be better to use the term human-referred or perceptual for HDR 
encodings. 

Nonetheless, the term scene-referred is still preferred, because sometimes we do 
wish to record more than the eye can see. Example applications for extrasensory 
data include the following. 


e Satellite imagery, in which the different wavelengths may be analyzed and 
visualized in false color 

e Physically-based rendering, in which lighting and texture maps interact to 
produce new colors in combination 

e Scientific and medical visualization, in which (abstract) data is collected and 
visualized 


In such applications, we need to record more than we could see of a scene with our 
naked eye, and HDR formats are a necessary means in accomplishing this. Further 
applications of HDR imagery are outlined in the following section. 
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3.2 APPLICATIONS OF HDR IMAGES 


The demands placed on an HDR encoding vary substantially from one application to 
another. In an Internet application, file size might be the deciding factor. In an image 
database, it might be decoding efficiency. In an image-compositing system, accuracy 
might be most critical. The following describe some of the many applications for 
HDR, along with a discussion of their requirements. 


Physically-based rendering (global illumination): Perhaps the first application to use HDR 
images, physically-based rendering, and lighting simulation programs must store 
the absolute radiometric quantities for further analysis and for perceptually based 
tone mapping [142,143]. In some cases, it is important to record more than 
what is visible to the human eye, as interactions between source and surface 
spectra multiply together. Additional accuracy may also be required of the en- 
coding to avoid accumulated errors, and alpha and depth channels may also be 
desirable. A wide dynamic range is necessary for image-based lighting, especially in 
environments that include daylight. (See Chapter 9.) 


Remote sensing: As mentioned in the previous section, satellite imagery often con- 
tains much more than is visible to the naked eye [75]. HDR is important for 
these images, as is multispectral recording and the ability to annotate using image 
metadata. Accuracy requirements may vary with the type of data being recorded, 
and flexibility is the key. 


Digital photography: Camera makers are already heading in the direction of scene- 
referred data with their various RAW formats, but these are cumbersome and 
inconvenient compared to the standard encodings described in this chapter.* It 
is only a matter of time before cameras that directly write HDR images begin to 
appear on the market. File size is clearly critical to this application. Software com- 
patibility is also important, although this aspect is largely neglected by camera 
RAW formats. Adobe’s Digital Negative specification and software works toward 
alleviating this problem [3]. 


1 Each camera manufacturer employs its own proprietary format, which is usually not compatible with other manufacturers’ 
RAW formats or even with popular image-editing software. These formats are collectively called RAW, because the 
camera’s firmware applies only minimal processing to the data that are read from the sensor. 
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Image editing: Image-editing applications with support for HDR image data are 
now available. Photoshop CS 2 incorporates reading and writing of 32-bit pixel 
data, as does Photogenics (www.idruna.com), and a free open-source application 
called Cinepaint (www.cinepaint.org). A vast number of image-editing operations 
are possible on HDR data that are either difficult or impossible using standard 
output-referred data, such as adding and subtracting pixels without running un- 
der or over range, extreme color and contrast changes, and white balancing that 
works. Accuracy will be an important requirement here, again to avoid accu- 
mulating errors, but users will also expect support for all existing HDR image 


formats. 


Digital cinema (and video): Digital cinema is an important and fast-moving appli- 
cation for HDR imagery. Currently, the trend is heading in the direction of a 
medium-dynamic-range output-referred standard for digital film distribution. 
Film editing and production, however, will be done in some HDR format that 
is either scene-referred or has some intermediate reference, such as movie film 
stock. For intermediate work, resolution and color accuracy are critical, but file 
size is also a consideration in that there are over 200,000 frames in a two-hour 
movie, and each of these may be composited from dozens of intermediate lay- 
ers. Rendering a digital movie in HDR also permits HDR projection. (See Chap- 
ter 5, on display devices.) Looking further ahead, an exciting possibility is that 
HDR video may eventually reach the small screen. At least one HDR MPEG ex- 
tension has already been proposed, which we discuss at the end of the next 
section. 


Virtual reality: Many web experiences require the efficient transmission of im- 
ages, which are usually encoded as JPEG or some other lossy representation. In 
cases in which a user is attempting to view or move around a virtual space, 
image exposure is often a problem. If there were a version of QuicktimeVR 
that worked in HDR, these problems could be solved. Establishing standards for 
lossy HDR compression is therefore a high priority for virtual reality on the 
Web. 
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Format Encoding(s) Compression Metadata Support/Licensing 
HDR RGBE Run-length Calibration, Open source software 
color space, (Radiance) 
XYZE Run-length +user-defined Quick implementation 
TIFF IEEE RGB None Calibration, Public domain library 
color space, (libtiff) 
LogLuv24 None +registered, 
+user-defined 
LogLuv32 Run-length 
EXR Half RGB Wavelet, ZIP Calibration, Open source library 
color space, (OpenEXR) 
+windowing, 
+user-defined 


TABLE 3.1 Established HDR image file formats. 


Each of these applications, and HDR applications not yet conceived, carries its own 


particular requirements for image storage. The following section lists and compares 
the established HDR formats and discusses upcoming formats. 


3.3 HDR IMAGE FORMATS 


Table 3.1 lists three existing HDR image formats and compares some of their key 
attributes. The encodings within these formats are broken out in Table 3.2, where 
the basic parameters are given. In some cases, one format may support multiple 


encodings (e.g., TIFF). In other cases, we list encodings that have not yet appeared 
in any format but are the subject of published standards (e.g., sCRGB). The standard 
24-bit RGB (sRGB) encoding is also included in Table 3.2, as a point of comparison. 
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Encoding Color Space Bits/pixel Dynamic Range (logi9) Relative Step 
sRGB RGB in [0,1] range 24 1.6 orders Variable 
RGBE Positive RGB 32 76 orders 1.0% 

XYZE (CIE) XYZ 32 76 orders 1.0% 

IEEE RGB RGB 96 79 orders 0.000003% 
LogLuv24 Log Y + (u’,v’) 24 4.8 orders 1.1% 
LogLuv32 Log Y + (u’,v’) 32 38 orders 0.3% 

Half RGB RGB 48 10.7 orders 0.1% 
scRGB48 RGB 48 3.5 orders Variable 
scRGB-nl RGB 36 3.2 orders Variable 
scYCC-nl YCgCpr 36 3.2 orders Variable 


TABLE 3.2 HDR pixel encodings, in order of introduction. 


Formats based on logarithmic encodings, LogLuv24 and LogLuv32, maintain a 
constant relative error over their entire range.” For the most part, the floating-point 
encodings RGBE, XYZE, IEEE RGB, and Half RGB also maintain a constant relative 
error. The dynamic ranges quoted for the encodings sRGB, scRGB48, scRGB-nl, and 
scYCC-nl are based on the point at which their relative steps pass 5%. Above 5%, 
adjacent steps in the encoding are easily distinguished. If one were to view an sRGB 
image on an HDR display, regions below 0.025 of the maximum would exhibit 
visible banding, similar to that shown in Figure 3.1.3 For luminance quantization 
to be completely invisible, the relative step size must be held under 1% [149]. This 


2 Relative step size is the difference between adjacent values divided by the value. The relative error is generally held to 
half the relative step size, and is the difference between the correct value and the representation divided by the correct 
value. 

3 Thus, the dynamic range of sRGB is 0.025:1, which is the same ratio as 1:1016. In Table 3.2, we just report the number 
of orders (powers of 10). 
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FIGURE 3.4 Banding due to quantization at the 5% level. 


is the goal of most HDR encodings, and some have relative step sizes considerably 
below this level. Pixel encodings with variable quantization steps are difficult to 
characterize in terms of their maximum dynamic range and are ill suited for HDR 
applications in which the display brightness scaling is not predetermined. 


3.3.1 THE HDR FORMAT 


The HDR format, originally known as the Radiance picture format (hdr, .pic), was 
first introduced as part of the Radiance lighting simulation and rendering system in 
1989 [144], and has since found widespread use in the graphics community, par- 
ticularly for HDR photography and image-based lighting [17,18]. (See Chapters 4 
and 9.) The file wrapper consists of a short ASCII header, followed by a resolution 
string that defines the image size and orientation, followed by the run-length en- 
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Red Green Blue Exponent 


Bit breakdown for the 32-bit/pixel RGBE (and XYZE) encodings. 


coded pixel data. The pixel data comes in two flavors: a 4-byte RGBE encoding [138] 
and a CIE variant, XYZE. The bit breakdown is shown in Figure 3.2. 

The RGBE components Rm, Gm, and By are converted from the scene-referred 
color (Rw, Gw, Bw) via the following formula. 


E = | log, (max (Rw, Gw, Bw)) + 128] 
There is also a special case for an input in which max(Rw, Gw, Bw) is less than 


10-38, which is written out as (0, 0,0, 0). This gets translated to (0, 0, 0) on the 
reverse conversion. The reverse conversion for the normal case is as follows. 


Ry = BAS ye-108 


256 
Gy OS gas 
TE R) 
w 256 
_ But05 ne- 
TEE Eas 


256 
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The conversions for XYZE are precisely the same, with the exception that CIE X, 
Y, and Z are substituted for R, G, and B, respectively. Because the encoding does 
not support negative values, using XYZE instead of RGBE extends the range to cover 
the entire visible gamut. (See Chapter 2 for details on the CIE XYZ space.) The 
dynamic range for these encodings is quite large (over 76 orders of magnitude), 
and the accuracy is sufficient for most applications. Run-length encoding achieves 
an average of 25% compression (1:1.3), making the image files about as big as 
uncompressed 24-bit RGB. 


3.3.2 THE TIFF FLOAT AND LOGLUV FORMATS 


For over a decade, the Tagged Image File Format (tif, .tiff) has included a 32- 
bit/component IEEE floating-point RGB encoding [2]. This standard encoding is 
in some ways the ultimate in HDR image representations, covering nearly 79 orders 
of magnitude in miniscule steps. The flip side to this is that it takes up more space 
than any other HDR encoding — over three times the space of the Radiance for- 
mat (described in the preceding section). The TIFF library does not even attempt to 
compress this encoding, because floating-point data generally do not compress very 
well. Where one might get 30% compression from run-length encoding of RGBE 
data, 10% is the most one can hope for using advanced entropy compression (e.g., 
ZIP) on the same data stored as IEEE floats. This is because the last 12 bits or more 
of each 24-bit mantissa will contain random noise from whatever camera or global 
illumination renderer generated them. There simply are no image sources with 7 
decimal digits of accuracy, unless they are completely synthetic (e.g., a smooth gra- 
dient produced by a pattern generator). 

Nevertheless, 96-bit/pixel RGB floats have a place, and that is as a lossless inter- 
mediate representation. TIFF float is the perfect encoding for quickly writing out 
the content of a floating-point frame buffer and reading it later without loss. Sim- 
ilarly, raw floats are a suitable means of sending image data to a compositor over 
a high-bandwidth local connection. They can also serve as a “gold standard” for 
evaluating different HDR representations, as shown in Section 3.4. However, most 
programmers and users are looking for a more compact representation, and within 
TIFF there are two: 24-bit and 32-bit LogLuv. 

The LogLuv encoding was introduced as a perceptually based color encod- 
ing for scene-referred images [71]. Like the IEEE float encoding just described, 
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LogLuv is implemented as part of the popular public domain TIFF library. (See 
www.remotesensing.org/libtiff. Appropriate examples are also included on the companion 
DVD-ROM.) The concept is the same for the 24-bit and 32-bit/pixel variants, but 
they achieve a different range and accuracy. In both cases, the scene-referred data is 
converted to separate luminance (Y) and CIE (u, v) channels. (Review Chapter 2 for 
the conversions between CIE and RGB color spaces.) The logarithm of luminance 
is then taken, and the result is quantized into a specific range, which is different 
for the two encodings, although both reserve the 0 code for Y = 0 (black). In the 
case of the 24-bit encoding, only 10 bits are available for the log luminance value. 
Quantization and recovery are computed as follows. 


Li= | 64 (log, Yw + 12) | 
Lio + 0.5 E 
Yw=2 64 


12 


This encoding covers a world luminance (Yw) range of 0.00025:15.9, or 4.8 orders 
of magnitude in uniform (1.1%) steps. In cases in which the world luminance 
is skewed above or below this range, we can divide the scene luminances by a 
constant and store this calibration factor in the TIFF STONITS tag.* When decoding 
the file, applications that care about absolute values consult this tag and multiply the 
extracted luminances accordingly. 

The remaining 14 bits of the 24-bit LogLuv encoding are used to represent chro- 
maticity, based on a lookup of CIE (u, v) values, as diagrammed in the lower por- 
tion of Figure 3.3. A zero lookup value corresponds to the smallest v in the visible 
gamut, and subsequent table entries are built up left to right, then bottom to top, 
in the diagram. The uniform step size for u and v is 0.0035, which is just large 
enough to cover the entire visible gamut in 2!* codes. The idea is that employing a 
perceptually uniform color space, in which equal steps correspond to equal differences 
in color, keeps quantization errors below the visible threshold. Unfortunately, both 
the (u, v) step size and the luminance step size for the 24-bit encoding are slightly 
larger than the ideal. This quantization was chosen to cover the full gamut over a 
reasonable luminance range in a 24-bit file, and the TIFF library applies dithering 


4 STONITS stands for “sample-to-nits.” Recall from Chapter 2 that the term nits is shorthand for candelas/meter?. 
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Bit breakdown for 24-bit LogLuv encoding and method used for CIE (u, v) 
lookup. 


by default to hide steps where they might otherwise be visible. Because there is no 
compression for the 24-bit LogLuv encoding, there is no penalty in dithering. 

The 32-bit LogLuv TIFF encoding is similar to the 24-bit LogLuv variant, but 
allows a greater range and precision. The conversion for luminance is as follows. 


L15 = [256 (log, Yw + 64) | 


5 Dithering is accomplished during encoding by adding a random variable in the (-0.5,0.5) range immediately before 
integer truncation. 
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Bit breakdown for 32-bit LogLuv encoding. Upper- and lower-order bytes are 
separated per scan line during run-length compression to reduce file size. 


This 15-bit encoding of luminance covers a range of 5.5 x 107° : 1.8 x 101°, 
or 38 orders of magnitude in 0.3% steps. The bit breakdown for this encoding is 
shown in Figure 3.4. The leftmost bit indicates the sign of luminance, permitting 
negative values to be represented. The CIE u and v coordinates are encoded in 8 
bits each, which allows for sufficiently small step sizes without requiring a lookup. 
The conversion for chromaticity is simply 


ug = | 410 u’ | 

vg = [410 v’ | 

, ug+0.5 

u = ——_— 
410 

, vg+0.5 

v = —_. 
410 


Again, dithering may be applied by the TIFF library to avoid any evidence of quan- 
tization, but it is not used for 32-bit LogLuv by default because the step sizes are 
below the visible threshold and run-length compression would be adversely af- 
fected. The compression achieved by the library for undithered output is 10 to 
70%. Average compression is 40% (1:1.7). 

Most applications will never see the actual encoded LogLuv pixel values, in that 
the TIFF library provides conversion to and from floating-point XYZ scan lines. 
However, it is possible through the use of lookup on the raw encoding to combine 


6 This is useful for certain image-processing operations, such as compositing and error visualizations. 
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the reading of a LogLuv file with a global tone-mapping operator, thus avoiding 
floating-point calculations and providing for rapid display [70]. The TIFF library 
provides raw data access for this purpose. 


3.3.3 THE OPENEXR FORMAT 


The EXtended Range format (.exr) was made available as an open-source C++ library 
in 2002 by Industrial Light and Magic (see www.openexr.com) [62]. It is based on a 
16-bit half floating-point type, similar to IEEE float with fewer bits. Each RGB pixel 
occupies a total of 48 bits, broken into 16-bit words (as shown in Figure 3.5). The 
Half data type is also referred to as S5E10, for “sign, five exponent, ten mantissa.” 
The OpenEXR library also supports full 32-bit/channel (96-bit/pixel) floats and a 
new 24-bit/channel (72-bit/pixel) float type introduced by Pixar. We have already 
discussed the 32-bit/channel IEEE representation in the context of the TIFF format, 
and we have no further information on the 24-bit/channel type at this time. We 
will therefore restrict our discussion to the 16-bit/channel Half encoding. 

The formula for converting from an encoded Half value follows. Here, S is the 
sign bit, E the exponent (0 to 31), and M the mantissa (0 to 1,023). 


(—1)8 22-15 (1+ — 1<E<30 
h= 


Sign Exponent Mantissa 


5 Bit breakdown for the OpenEXR Half pixel encoding. 
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If the exponent E is 31, the value is infinity if M = 0 and NaN (not a number) 
otherwise. Zero is represented by all zero bits. The largest representable value in 
this encoding is 65,504, and the smallest normalized (i.e., full accuracy) value is 
0.000061. This basic dynamic range of 9 orders is enhanced by the “denormalized” 
values below 0.000061, which have a relative error below 5% down to 0.000001, 
for a total dynamic range of 10.7 orders of magnitude. Over most of this range, 
the quantization step size is under 0.1%, which is far below the visible thresh- 
old. This permits extensive image manipulations before artifacts become evident, 
which is one of the principal strengths of this encoding. Another advantage of the 
Half encoding is that the selfsame 16-bit representation is specified in NVidia’s Cg 
language [79]. This will ultimately make transfer to and from graphics hardware 
straightforward and is promising for future hardware standardization as well. 

The OpenEXR library contains C++ classes for reading and writing EXR image 
files, with support for lossless compression, tiling, and mip mapping. Compression 
is accomplished using the ZIP deflate library as one alternative, or Industrial Light 
and Magic’s (ILM) more efficient PIZ lossless wavelet compression. From our ex- 
periments, PIZ achieves a 60% reduction on average compared to uncompressed 
48-bit/pixel RGB. OpenEXR also supports arbitrary data channels, including alpha, 
depth, and user-defined image data. Similar to the TIFF format, standard attributes 
are provided for color space, luminance calibration, pixel density, capture date, cam- 
era settings, and so on. User-defined attributes are also supported, and unique to 
OpenEXR is the notion of a “display window” to indicate the active region of an 
image. This is particularly useful for special effects compositing, wherein the notion 
of what is on-screen and what is off-screen may evolve over the course of a project. 


3.3.4 OTHER ENCODINGS 


There are a few other encodings that have been used or are being used to represent 
medium-dynamic-range image data (i.e., between 2 and 4 orders of magnitude). 
The first is the Pixar log encoding, which is available in the standard TIFF library 
along with LogLuv and IEEE floating point. This 33-bit/pixel encoding assigns each 
of 11 bits to red, green, and blue using a logarithmic mapping designed to fit the 
range of movie film. The implementation covers about 3.8 orders of magnitude in 
0.4% steps, making it ideal for film work but marginal for HDR work. Few peo- 
ple have used this encoding outside of Pixar, and they have themselves moved to a 
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higher-precision format. Another image standard that is even more specific to film 
is the Cineon format, which usually records logarithmic density in 10 bits/channel 
over a 2.0 range (www.cineon.com). Although these 2.0 orders of magnitude may cor- 
respond to slightly more range once the film response curve has been applied, it 
does not qualify as an HDR encoding, and it is not scene-referred. Mechanically, the 
Cineon format will handle greater bit depths, but the meaning of such an extension 
has never been defined. 

More recently, the IEC has published a standard that defines the scRGB48, scRGB- 
nl, and scYCC-nl encodings, listed in Table 3.2 [56]. As shown in Table 3.2, these 
encodings also encompass a relatively small dynamic range, and we are not aware 
of any software product or file format that currently uses them. We will therefore 
leave this standard out of this discussion, although an analysis may be found at 
www.anyhere.com/gward/hdrenc as well as on the companion DVD-ROM. 


3.3.5 EMERGING “LOSSY” HDR FORMATS 


All of the HDR image formats we have discussed so far, and indeed all of the HDR 
standards introduced to date, are lossless insofar as once the original scene values have 
been converted into the encoding, no further loss takes place during storage or sub- 
sequent retrieval. This is a desirable quality in many contexts, especially when an 
image is expected to go through multiple storage and retrieval steps (with possible 
manipulations) before reaching its final state. However, there are some applications 
for which a lossy format is preferred, particularly when the storage costs are onerous 
or further editing operations are anticipated or desired. Two such applications lie 
just around the corner, and they will need suitable lossy standards to meet their 
needs: HDR photography and HDR video. At the time of writing, two lossy en- 
coding methods have been introduced for HDR: one for still images and one for 
video. 


HDR Disguised as JPEG: The Sub-band Encoding Method Ward and Sim- 
mons developed a still image format that is backward compatible with the 8-bit 
JPEG standard [136]. This sub-band encoding method stores a tone-mapped image as a 
JPEG/JFIF file, packing restorative information in a separate 64-Kbyte marker. Naive 
applications ignore this marker as extraneous, but newer software can recover the 
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The sub-band encoding pipeline. (Reprinted from [136].) 


full HDR data by recombining the encoded information with the tone-mapped im- 
age. In other words, 64 Kbytes is enough to create a scene-referred original from an 
output-referred JPEG. Since most JPEG images produced by today’s digital cameras 
are over 1 Mbyte, this is only a 5% increase in file size. By comparison, the most 
compact lossless HDR format requires 16 times as much storage space as JPEG. 

Figure 3.6 shows the encoding pipeline, including the to-be-specified tone- 
mapping operator (TM). In principle, any tone-mapping operator can work, but 
we found that the photographic operator [109] (see Section 7.3.6) and the bilateral 
filter operator [23] (see Section 8.1.2) worked the best with this method. Once the 
tone-mapped image is derived, its pixels are divided into the original to obtain a 
grayscale “ratio image,” which is then compressed and incorporated in the JPEG file 
as a sub-band marker. 

Figure 3.7 shows an HDR image of a church that has been decomposed into a 
tone-mapped image and the corresponding ratio image. The ratio image is down- 
sampled and log encoded before being passed to the JPEG compressor to squeeze it 
into 64 Kbytes. This size is the upper limit for a JFIF marker, and the ratio image 
size and compression quality are optimized to fit within a single marker, although 
multiple markers might be used in some cases. Loss of detail is prevented either by 
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An HDR image of a church divided into a tone-mapped version and the downsam- 
pled ratio image that is stored as a sub-band. 


enhancing edges in the tone-mapped image to compensate for the downsampled 
ratio image or by synthesizing high frequencies in the ratio image during upsam- 
pling, depending on the application and user preference. The dynamic range of 
the format is unrestricted in the sense that the log encoding for the ratio image is 
optimized to cover the input range with the smallest step size possible. 

Figure 3.8 illustrates the decode process. A naive application extracts the tone- 
mapped pixels and treats them as a standard output-referred image. An HDR appli- 
cation, however, recognizes the sub-band and decompresses both this ratio image 
and the tone-mapped version, multiplying them together to recover the original 
scene-referred data. 

Two clear benefits arise from this strategy. First, a tone-mapped version of the 
HDR image is immediately available—for naive applications that cannot handle 
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The sub-band decoding pipeline. The lower left shows the backward-compatible 
path for naive applications, and the upper right shows the path for an HDR decoder. 


anything more and for HDR applications that may be able to perform their own 
tone-mapping given time but wish to provide the user with immediate feedback. 
Second, by making the format backward compatible with the most commonly used 
image type for digital cameras an important barrier to adoption has been removed 
for the consumer, and hence for camera manufacturers. 

Ward and Simmons tested the sub-band encoding method on 15 different HDR 
images, passing them through a single encoding-decoding cycle and comparing 
them to the original using Daly’s visible differences predictor (VDP) [15]. With 
the exception of a single problematic image, VDP showed that fewer than 0.2% of 
the pixels had visible differences at the maximum quality setting in any one image, 
which went up to an average of 0.6% at a 90% quality setting. 

The problem image was a huge (5,462 x 4,436) Radiance rendering of a car 
in a tunnel, with stochastic samples whose local variance spanned about 5 orders 
of magnitude. This proved too much for the downsampled ratio image to cope 
with, leading to the conclusion that for some large renderings it may be necessary 
to break the 64-Kbyte barrier and record multiple markers to improve accuracy. 
In most cases, however, the authors found that 64 Kbytes was ample for the sub- 
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band, and the overall image size was therefore comparable to a JPEG file of the same 
resolution. This makes a strong argument for lossy HDR compression when file size 
is critical. In addition to digital photography, Internet sharing of HDR imagery is a 
prime candidate for such a format. 


An HDR Extension to MPEG Mantiuk et al. have introduced an HDR encoding 
method built on the open-source XviD library and the MPEG-4 video standard [78]. 
The diagram in Figure 3.9 shows the standard MPEG compression pipeline in black. 
Extensions to this pipeline for HDR are shown in blue. The modified MPEG encod- 
ing pipeline proceeds as follows for each frame. 


1 A 32-bit/channel XYZ is taken on input rather than 8-bit/channel RGB. 

2 XYZ is converted into CIE (u,v) coordinates of 8 bits each and an 11-bit 
perceptual luminance encoding, Lp. 

3 This 11/8/8 bit encoding is passed through a modified discrete cosine trans- 
form (DCT), which extracts high-contrast edges from the luminance channel 
for separate run-length encoding. 

4 The DCT blocks are quantized using a modified table and passed through a 
variable-length coder. 

5 The edge blocks are joined with the DCT blocks in an HDR-MPEG bit stream. 


The decoding process is essentially the reverse of this, recombining the edge blocks 
at the DCT reconstruction stage to get back Lp (u, v) color values for each pixel. 
These may then be decoded further, into CIE XYZ floats, or passed more efficiently 
through appropriate lookup tables for real-time display (e.g., tone mapping). 

One of the key optimizations in this technique is the observation that the en- 
tire visible range of luminances, 12 orders of magnitude, can be represented in 
only 11 bits using invisible quantization steps. By taking advantage of the human 
contrast versus intensity (CVI) curve, it is possible to find a varying step size from 
the minimum perceivable luminance to the maximum, avoiding wasted codes in 
the darker regions (where the eye is less sensitive) [35].’ This implicitly assumes 
that the encoded information has some reasonable calibration, and that the ultimate 


7 The CVI curve is equal to the threshold versus intensity (TVI) curve divided by adaption luminance. The TVI function is 
discussed in detail in Chapter 6. 
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consumer is a human observer as opposed to a rendering algorithm in need of HDR 
input. These are defensible assumptions for compressed video. Even if the absolute 
calibration of the incoming luminances is unknown, a suitable multiplier could be 
found to take the maximum input value to the maximum Lp representation, thus 
avoiding clipping. In that an observer would not be able to see more than 12 orders 
of magnitude below any safe maximum, such a scaling would provide full input 
visibility. The only exception to this is if the input contains an unreasonably bright 
source in the field of view, such as the sun. 

Figure 3.10 shows the human CVI curve compared to the quantization errors 
for the encodings we have discussed. The blue line shows the error associated with 
Mantiuk et al.'s Lp encoding, which mirrors human contrast sensitivity while stay- 
ing comfortably below the visible threshold. When HDR video displays enter the 
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3 Quantization error of HDR encodings as a function of adapting luminance. (The 
sawtooth form of the floating point encodings has been exaggerated for clarity.) 
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market, an extended video standard will be needed, and this research is an impor- 
tant step toward such a standard. 


3.4 HDR ENCODING COMPARISON 


To compare HDR image formats, we need a driving application. Without an applica- 
tion, there are no criteria, just speculation. The application determines the context 
and sets the requirements. 

For our example comparison, we have chosen a central application for HDR: 
scene-referred image archival. Specifically, we wish to store HDR images from any 
source to be displayed at some future date on an unknown device at the highest 
quality it supports. Assuming this display has not been invented, there is no basis 
for writing to an output-referred color space, and hence a scene-referred encoding 
is the only logical representation. Thus, the need for HDR. 

A reasonable assumption is that a full spectral representation is not necessary, 
because humans perceive only three color dimensions. (Refer to Chapter 2.) Further, 
we assume that it is not necessary to record more than the visible gamut, although it 
is not safe to assume we can store less. Likewise, the quantization steps must be kept 
below the visible threshold, but because we plan no further manipulations prior 
to display, extra accuracy only means extra storage. The requirements for image 
archiving are therefore as follows. 


e Cover the visible gamut with a tristimulus color space (XYZ, RGB, and so 
on) 

e Cover the full range of perceivable luminances 

e Have quantization steps below the visible threshold at all levels 


Furthermore, it is desirable for the format to: 


e Minimize storage costs (Mbytes/Mpixel) 
e Encode and decode quickly 


Considering the previous requirements list, we can rule out the use of the RGBE 
encoding (which does not cover the visible gamut) and the 24-bit LogLuv encod- 
ing, which does not cover the full luminance range. This leaves us with the XYZE 
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encoding (.hdr), the IEEE floating-point and 32-bit LogLuv encodings (.tif), and the 
Half encoding (.exr). Of these, the IEEE float representation will clearly lose in terms 
of storage costs, but the remaining choices merit serious consideration. These are 
as follows. 


e The 32-bit Radiance XYZE encoding 
e The 32-bit LogLuv encoding 
e The 48-bit OpenEXR Half encoding 


On the surface, it may appear that the XYZE and LogLuv encodings have a slight edge 
in terms of storage costs, but the OpenEXR format includes a superior compression 
engine. In addition, the extra bits in the Half encoding may be worthwhile for 
some archiving applications that need or desire accuracy beyond normal human 
perception. To evaluate the candidate formats, the following test was conducted on 
a series of IEEE floating-point images, some captured and some rendered. 


1 The data is encoded into the test format, noting the CPU time and disk space 
requirements. 

2 The data is then decoded, noting the CPU time required. 

3 The decoded data can then be compared to the original using CIE AE* 1994 
perceptual color difference metric. 


CIE AE* 1994 is an updated version of the perceptual difference metric pre- 
sented in Chapter 2 [80]. Using this metric, an encoded pixel color can be com- 
pared to the original source pixel by computing the visible difference. However, we 
must first modify the difference metric to consider local adaptation in the context 
of HDR imagery. To do this, the brightest Y value within a fixed region about the 
current pixel is found, and this value is used as the reference white. This simulates 
the effect of a viewer adapting their vision (or display) locally, as we would expect 
them to do with an HDR image. The only question is how large a region to use, 
and for this a reasonable choice is to use a radius of 50 pixels, as this tends to be 
the size of interesting objects in our test image set. 

Among the test images, we included a synthetic pattern that covered the full 
visible gamut and dynamic range with sufficient density to sample quantization 
errors at all levels. This pattern, a spiral slice through the visible gamut from 0.01 
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| The gamut test pattern, spanning eight orders of magnitude. This image was tone 


mapped with the histogram adjustment technique for the purpose of reproduction in this book (see 
Section 7.2.8). 


to 1,000,000 cd/m’, is shown in Figure 3.11. (This image is included on the 
companion DVD-ROM as an JEEE floating-point TIFF.) Each peak represents one 
revolution through the visible color gamut, and each revolution spans one decade 
(factor of 10) in luminance. The gray-looking regions above and below the slice 
actually contain random colors at each luminance level, which provide an even 
more thorough testing of the total space. Obviously, tone mapping has been used 
to severely compress the original dynamic range and colors in order to print this 
otherwise undisplayable image. 

Figure 3.12 shows the CIE A E* encoding errors associated with a 24-bit sRGB 
file, demonstrating how ill suited LDR image formats are for archiving real-world 
colors. Only a narrow region covering under two orders of magnitude with an in- 
complete gamut is below the visible difference threshold (2.0 in AE”). In contrast, 
the three HDR encodings we have chosen for this application do quite well on this 
test pattern, as shown in Figure 3.13. Errors are held below the visible threshold in 
each encoding over all eight orders, except for a few highly saturated colors near 
the top of the EXR Half range. The average AE™ values for Radiance XYZE, 32-bit 
LogLuv, and EXR Half were 0.2, 0.3, and 0.06, respectively. 

Figure 3.14 shows the two encodings we rejected on the basis that they did not 
cover the full dynamic range and gamut, and indeed we see they do not. As ex- 
pected, the Radiance RGBE encoding is unable to represent highly saturated colors, 
although it easily spans the dynamic range. The 24-bit LogLuv encoding, on the 
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other hand, covers the visible gamut, but only spans 4.8 orders of magnitude. Al- 
though they may not be well suited to our proposed application, there are other 
applications to which these encodings are perfectly suited. In some applications, for 
example, there is no need to represent colors outside those that can be displayed 
on an RGB monitor. Radiance RGBE has slightly better resolution than XYZE in the 
same number of bits and does not require color transformations. For other appli- 
cations, 4.8 orders of magnitude may be sufficient because they only need to cover 
the human simultaneous range; that is, the range over which an observer can com- 
fortably adapt without the use of blinders. Because 24-bit LogLuv covers the full 
gamut in this range as well, applications that need to fit the pixel data into a 24- 
bit buffer for historical reasons may prefer it to the 32-bit alternatives. It was used 
in a proprietary hardware application in which a prepared 24-bit lookup translates 
scene-referred colors to device space via a 16-million entry table. Such a lookup 
would be impossible with a 32-bit encoding, which would require 4 billion en- 
tries. 

In addition to color gamut and dynamic range, we are also interested in the 
statistical behavior of these formats on real images, especially with regard to file 
size and compression times. Figure 3.15 shows a test set of 34 images. Of these, 19 
are HDR photographs of real scenes and 15 are computer generated, and sizes range 
from 0.2 to 36 Mpixels, with 2.4 Mpixels being average. 


DeltaE* 
7. 


This false color plot shows the visible error behavior of the 24-bit sRGB encoding 
on the test pattern shown in Figure 3.11. (CIE AE™ values above 2 are potentially visible, and 
above 5 are evident.) 
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HDR XYZE 


32-bit LogLuv 


EXR half 


FIGURE 3.413 Error levels for the chosen HDR encodings applied to the gamut test pattern from 
Figure 3.11, using the same scale as shown in Figure 3.12. 


Figure 3.16 charts the read/write performance and file size efficiency for each 
of the three selected formats. This figure shows that the Radiance HDR format has 
the fastest I/O performance, but creates larger files. The OpenEXR library is con- 
siderably slower with its I/O, but creates smaller files than Radiance, despite the 
48 bits of the Half pixel encoding. The 32-bit LogLuv TIFF format has intermediate 
I/O performance, and produces the smallest files. 

The average CIE AE* error performance of the three encodings is the same over 
the entire image set as we reported for the gamut test alone, with the following 
exceptions. One of the test images from ILM, “Desk,” contained pixel values that 
are completely outside the visible gamut and could not be reproduced with either 
Radiance XYZE or 32-bit LogLuv. Because we do not expect a need for archiving 
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HDR RGBE 


24-bit LogLuv 


The CIE AE™ associated with the gamut test pattern in Figure 3.11 for the 
Radiance RGBE and 24-bit LogLuv encodings, using the same scale as Figure 3.12. 


colors that cannot be seen or reproduced, this should not count against these two 
encodings for this application. A few of the renderings had pixels outside the rep- 
resentable dynamic range of EXR’s Half data type. In those cases, we did not resort 
to scaling the images to fit within the 10~°:10* range as we might have. 

In summary, we found that the XYZE and LogLuv encodings are restricted to the 
visible gamut, and the Half encoding has a slightly smaller dynamic range. Neither 
of these considerations is particularly bothersome, and thus we conclude that all 
three encodings perform well for HDR image archiving. 


3.5 CONCLUSIONS 


The principal benefit of using scene-referred HDR images is their independence 
from the display process. A properly designed HDR format covers the full range 
and sensitivity of human vision, and is thus prepared for any future display tech- 
nology intended for humans. Many HDR formats offer the further benefit, through 
additional range and accuracy, of permitting complex image operations without 
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StillLife testifit 


© The test set of 34 HDR images. 
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exposing quantization and range errors typical of more conventional LDR formats. 
The cost of this additional range and accuracy is modest — similar to including an 
extra alpha channel in an LDR format. This burden can be further reduced in cases 
in which accuracy is less critical (i.e., when multiple image read/edit/write cycles 
are not expected). 

All of the existing HDR file formats are “lossless” in the sense that they do not 
lose information after the initial encoding, and repeated reading and writing of the 
files does not result in further degradation. However, it seems likely that “lossy” 
HDR formats will soon be introduced that offer much better compression, on a par 
with existing JPEG images. This will remove an important barrier to HDR adoption 
in markets such as digital photography and video and in web-based applications 
such as virtual reality tours. 

The Resources section of the DVD-ROM includes complete demonstration software and images employ- 
ing the JPEG-HDR lossy compression format described as preliminary work in this chapter. 


HDR Image Capture 


HDR images may be captured from real 
scenes or rendered using 3D computer 
graphics (CG) techniques such as radios- 
ity and raytracing. A few modern graphics 
cards are even capable of generating HDR 
images directly. The larger topic of CG ren- 
dering is well covered in other textbooks 
[24,37,58,117,144]. In this chapter, the 
focus is on practical methods for capturing 
high-quality HDR images from real scenes using conventional camera equipment. 
In addition, commercial hardware designed to capture HDR images directly is be- 
ginning to enter the market, which is discussed toward the end of this chapter. 


4.1 PHOTOGRAPHY AND LIGHT MEASUREMENT 


A camera is essentially an imperfect device for measuring the radiance distribution 
of a scene, in that it cannot capture the full spectral content and dynamic range. 
(See Chapter 2 for definitions of color and radiance.) The film or image sensor in a 
conventional or digital camera is exposed to the color and dynamic range of a scene, 
as the lens is a passive element that merely refocuses the incoming light onto the 
image plane. All of the information is there, but limitations in sensor design prevent 
cameras from capturing all of it. Film cameras record a greater dynamic range than 
their digital counterparts, especially when they expose a negative emulsion. 
Standard black-and-white film emulsions have an inverse response to light, as 
do color negative films. Figure 4.1 shows example response curves for two film 
emulsions, demonstrating a sensitive range of nearly 4 log units, or a 10,000:1 
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4.0 


4.0 3.0 2.0 1.0 0.0 1.0 
Log exposure (lux-seconds) 


Characteristic film curves showing response for color (R, G, and B) and 
black-and-white negative films over nearly four orders of magnitude. 


contrast ratio. Depending on the quality of the lens and the blackness of the cam- 
era’s interior, some degree of “flare” may inhibit this range, particularly around a 
bright point such as the sun or its reflection. Although the capability of recording 
4 log units of dynamic range is there, flare may reduce the effective dynamic range 
somewhat. 

The film development process may also limit or enhance the information re- 
trieved from the exposed emulsion, but the final constraining factor is of course 
the printing process. It is here where tone mapping takes place, in that the effective 
dynamic range of a black-and-white or color print is about 100:1 at best. Darkroom 
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techniques such as dodge-and-burn may be used to get the most out of a negative, 
although in an industrial film processing lab what usually happens is more akin to 
autoexposure after the fact. 

To extract the full dynamic range from a negative, we need to digitize the de- 
veloped negative or apply a “dry developing method” such as that developed by 
Applied Science Fiction and marketed in the Kodak Film Processing Station [25]. 
Assuming that a developed negative is available, a film scanner would be required 
that records the full log range of the negative in an HDR format. Unfortunately, no 
such device exists, although it is technically feasible. However, one can take a stan- 
dard film scanner, which records either into a 12-bit linear or an 8-bit sRGB color 
space, and use multiple exposures to obtain a medium-dynamic-range result from a 
single negative. The process is identical to the idealized case for multiple-exposure 
HDR capture, which we describe in the following section. The same method may 
be used to obtain an HDR image from a sequence of exposures using a standard 
digital camera, or to enhance the dynamic range possible with a film camera. 


4.2 HDR IMAGE CAPTURE FROM MULTIPLE 
EXPOSURES 


Due to the limitations inherent in most digital image sensors, and to a lesser degree 
in film emulsions, it is not possible to capture the full dynamic range of an image 
in a single exposure. However, by recording multiple exposures a standard camera 
with the right software can create a single HDR image (i.e., a radiance map, as defined 
in Chapter 2). These exposures are usually captured by the camera itself, although in 
the case of recovering HDR information from a single negative the same technique 
may be applied during the film-scanning phase. 

By taking multiple exposures, each image in the sequence will have different 
pixels properly exposed, and other pixels under- or overexposed. However, each 
pixel will be properly exposed in one or more images in the sequence. It is therefore 
possible and desirable to ignore very dark and very bright pixels in the subsequent 
computations. 

Under the assumption that the capturing device is perfectly linear, each exposure 
may be brought into the same domain by dividing each pixel by the image’s expo- 
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sure time. From the recorded radiance values Le, this effectively recovers irradiance 
values Ee by factoring out the exposure duration.* 

Once each image is in the same unit of measurement, corresponding pixels may 
be averaged across exposures — excluding, of course, under- and overexposed pix- 
els. The result is an HDR image. 

In practice, cameras are not perfectly linear light measurement devices, objects 
frequently do not remain still between individual exposures, and the camera is 
rarely kept still. Thus, in practice this procedure needs to be refined to include cam- 
era response curves, image alignment techniques, and ghost and lens flare removal. 

Extracting a medium-dynamic-range radiance map from a single negative is rel- 
atively straightforward because it does not require alignment of multiple frames, 
and does not suffer from object displacement that may occur during the capture of 
several exposures. It therefore serves as the basis for the techniques presented later 
in this chapter. 


4.3 FILM SCANNING 


In the ideal case for creating an HDR image from multiple LDR exposures, the scene 
or image should be completely static (e.g., an exposed and developed negative). We 
assume that the response curve of the film is known. In addition, the LDR capture 
device (such as an 8-bit/primary film scanner with known response curves) should 
provide some means of exactly controlling the exposure during multiple captures. 

Creating an HDR image under these conditions starts by taking scans with mul- 
tiple exposures. In addition, the system response is inverted to get back to a linear 
relation between scene radiances and pixel values. Each scanned image is multi- 
plied by a calibration factor related to its exposure, and combined into an HDR 
result. The only question is what weighting function to use in averaging together 
the linear exposures. Of course, the lightest and darkest pixels at the limits of each 
exposure should be excluded from consideration because these pixels are under- or 
overexposed. But how should the pixels between be weighted? 


1 The quantity captured by the camera is spectrally weighted radiance. As such, calling this quantity “radiance” is inappro- 
priate. However, the spectral response curve is typically not the same as the CIE V(A) curve, and therefore this quantity 
also cannot be called “luminance” [18]. When the term radiance or irradiance is used, it should be understood that this 
refers to spectrally weighted radiance and irradiance. 
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0 50 100 150 200 250 
Pixel value 


The inverted system response function (solid line) and recommended Mit- 
sunaga—Nayar weighting function, multiplied by an additional hat function 1 — (2x — 1)!? to 
devalue the extrema, which are often suspect. 


Mann and Picard proposed a certainty/ weighting function equal to the derivative 
of the system response curve for each color channel, using the argument that greater 
response sensitivity corresponds to greater certainty [5]. Debevec and Malik used 
a simple hat function based on the assumption that mid-range pixels are more 
reliable [18]. Mitsunaga and Nayar used signal theory to argue for multiplying Mann 
and Picard’s weight by the response output, in that larger values are less influenced 
by a constant noise floor [82]. Any of these methods will yield a satisfactory result, 
although the latter weighting function is better supported by signal theory. The 
Mitsunaga—Nayar weighting seems to work best when multiplied by a broad hat 
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Our example exposure sequence. Each image is separated by two f-stops (equal to 


a factor of 4, or 0.6 logio units). 


function, as shown in Figure 4.2. This avoids dubious pixels near the extremes, 
where gamut limitations and clamping may affect the output values unpredictably. 

Figure 4.3 shows a sequence of perfectly aligned exposures. Figure 4.4 (left) 
shows the weighting used for each of the three contributing exposures, where blue 
is used for the longest exposure, green for the middle exposure, and red for the 
shortest exposure. As this figure shows, most pixels are a mixture of multiple ex- 
posures, with some pixels relying solely on the extremes of the exposure range. 
Figure 4.4 (right) shows the combined result, tone mapped using a histogram ad- 
justment operator [142]. 

If the multiple exposures come not from multiple scans of a single negative but 
from multiple negatives or digital images, combining images may become prob- 
lematic. First, the camera may shift slightly between exposures, which will result 
in some subtle (and possibly not-so-subtle) misalignments that will blur the re- 
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The combined HDR result, tone mapped using the histogram adjustment operator 
(described in Section 7.2.8) in the right-hand image. The left-hand image shows contributing 
input image weights, where blue shows where the longer exposure dominates, green the middle, and 
red the shorter exposure. Most output pixels are an average of two or more input values, which 
reduces noise in the final result. 


sults. Second, if the actual system response function is unknown the images must 
be aligned before this function can be estimated from the given exposures. Third, 
objects in the scene may shift slightly between frames or even make large move- 
ments, such as people walking in the scene as the photos are taken. Finally, flare 
in the camera lens may fog areas surrounding particularly bright image regions, 
which may not be noticeable in a standard LDR image. We will address each of 
these problems in turn and present some workarounds in the following sections. 
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4.4 IMAGE REGISTRATION AND ALIGNMENT 


Although several techniques have been developed or suggested for image alignment 
and registration, most originating from the computer vision community, only two 
techniques to our knowledge address the specific problem of aligning differently 
exposed frames for the purpose of HDR image creation. The first technique, from 
Kang et al. [63], handles both camera movement and object movement in a scene, 
and is based on a variant of the Lucas and Kanade motion estimation technique [77 ]. 
In an off-line postprocessing step, for each pixel a motion vector is computed be- 
tween successive frames. This motion vector is then refined with additional tech- 
niques, such as hierarchical homography (introduced by Kang et al.), to handle 
degenerate cases. 

Once the motion of each pixel is determined, neighboring frames are warped 
and thus registered with one another. Then, the images are ready to be combined 
into an HDR radiance map. The advantage of this technique is that it compensates 
for fairly significant motion, and is suitable (for instance) for capturing HDR video 
by exposing successive frames by different amounts of time. 

Although this method is suitable for significant motion, it relies on knowing the 
camera response function in advance. This presents a catch-22: alignment is needed 
to register samples to derive the camera response function, but the camera response 
function is needed to determine alignment. If the camera response is known or 
can be computed once and stored based on a set of perfectly aligned images, the 
catch-22 is solved. 

A second alignment technique (described in following material) employs a mean 
threshold bitmap (MTB), which does not depend on the camera response function 
for proper alignment [141]. This technique is also about 10 times faster than 
the Kang et al. method, in that it performs its alignment operations on bitmaps 
rather than 8-bit grayscale images, and does not perform image warping or re- 
sampling. However, the MTB alignment algorithm does not address moving ob- 
jects in the scene, and is not appropriate for arbitrary camera movements such 
as zooming and tilting. The method of Kang et al. may therefore be preferred in 
cases where arbitrary camera movement is expected. In the case of object mo- 
tion, we recommend a simpler and more robust postprocessing technique in Sec- 
tion 4.7. 
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4.5 THE MEAN THRESHOLD BITMAP ALIGNMENT 
TECHNIQUE 


In this section, we describe a method for the automatic alignment of HDR ex- 
posures [141].? Input to this exposure algorithm is a series of N 8-bit grayscale 
images, which may be approximated using only the green channel, or derived as 
follows from 24-bit sRGB with integer arithmetic.’ 


Y = (54 R + 183 G+ 19 B)/256 


One of the N images is arbitrarily selected as the reference image, and the output of 
the algorithm is a series of N-1 (x, y) integer offsets for each of the remaining im- 
ages relative to this reference. These exposures may then be recombined efficiently 
into an HDR image using the camera response function, as described in Section 4.6. 

The computation focuses on integer pixel offsets, because they can be used to 
quickly recombine the exposures without resampling. Empirical evidence suggests 
that handheld sequences do not require rotational alignment in about 90% of cases. 
Even in sequences in which there is some discernible rotation, the effect of a good 
translational alignment is to push the blurred pixels out to the edges, where they 
are less distracting to the viewer. 

Conventional approaches to image alignment often fail when applied to images 
with large exposure variations. In particular, edge-detection filters are dependent on 
image exposure (as shown in the left side of Figure 4.5, where edges appear and 
disappear at different exposure levels). Edge-matching algorithms are therefore ill 
suited to the exposure alignment problem when the camera response is unknown. 
The MTB approach incorporates the following features. 


e Alignment is done on bilevel images using fast bit-manipulation routines. 
e The technique is insensitive to image exposure. 
e For robustness, it includes noise filtering. 


2 Reprinted by permission of A.K. Peters, Ltd., from Greg Ward, “Fast, Robust Image Registration for Compositing High 
Dynamic Range Photographs from Hand-Held Exposures,” Journal of Graphics Tools, 8(2):17—30, 2003. 


3 This is a close approximation of the computation of luminance as specified by ITU-R BT.709. 
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Two unaligned exposures (middle) and their corresponding edge bitmaps (left) and 


median threshold bitmaps (right). The edge bitmaps are not used precisely because of their tendency 
to shift dramatically from one exposure level to another. In contrast, the MTB is stable with respect 
to exposure. 
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The results of a typical alignment are discussed in Section 4.5.4. If we are to rely on 
operations such as moving, multiplying, and subtracting pixels over an entire high- 
resolution image, the algorithm is bound to be computationally expensive, unless 
our operations are very fast. Bitmap images allow us to operate on 32 or 64 pixels 
at a time using bitwise integer operations, which are very fast compared to byte- 
wise arithmetic. We use a bitmap representation that facilitates image alignment 
independent of exposure level, the a forementioned median threshold bitmap. The MTB 
is defined as follows. 


1 Determine the median 8-bit value from a low-resolution histogram over the 
grayscale image pixels. 

2 Create a bitmap image with 0s where the input pixels are less than or equal 
to the median value, and 1s are where the pixels are greater. 


Figure 4.5 shows two exposures of an Italian stairwell (middle), and their corre- 
sponding edge maps (left) and MTBs (right). In contrast to the edge maps, the 
MTBs are nearly identical for the two exposures. Taking the difference of these two 
bitmaps with an exclusive-or (XOR) operator shows where the two images are mis- 
aligned, and small adjustments in the x and y offsets yield predictable changes in 
this difference due to object coherence. However, this is not the case for the edge 
maps, which are noticeably different for the two exposures, even though we at- 
tempted to compensate for the camera’s nonlinearity with an approximate response 
curve. Taking the difference of the two edge bitmaps would not give a good indi- 
cation of where the edges are misaligned, and small changes in the x and y offsets 
yield unpredictable results, making gradient search problematic. More sophisticated 
methods of determining edge correspondence are necessary to use this information, 
and we can avoid these and their associated computational costs with the MTB-based 
technique. 

The constancy of an MTB with respect to exposure is a very desirable property 
for determining image alignment. For most HDR reconstruction algorithms, the 
alignment step must be completed before the camera response can be determined, 
in that the response function is derived from corresponding pixels in the differ- 
ent exposures. An HDR alignment algorithm that depends on the camera response 
function poses a catch-22 problem, as described earlier. By its nature, an MTB is 
the same for any exposure within the usable range of the camera, regardless of the 
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response curve. As long as the camera’s response function is monotonic with re- 
spect to world radiance, the same scene will theoretically produce the same MTB 
at any exposure level. This is because the MTB partitions the pixels into two equal 
populations: one brighter and one darker than the scene’s median value. Because 
the median value does not change in a static scene, the derived bitmaps likewise do 
not change with exposure level. 

There may be certain exposure pairs that are either too light or too dark to use the 
median value as a threshold without suffering from noise, and for these we choose 
either the 17th or 83rd percentile as the threshold, respectively. Although the offset 
results are all relative to a designated reference exposure, we actually compute offsets 
between adjacent exposures and thus the same threshold may be applied to both 
images. Choosing percentiles other than the 50th (median) results in fewer pixels 
to compare, and this makes the solution less stable and thus we may choose to limit 
the maximum offset in certain cases. The behavior of percentile threshold bitmaps 
is otherwise the same as the MTB, including stability over different exposures. In 
the remainder of this section, when we refer to the properties and operations of 
MTBs, the same applies for other percentile threshold bitmaps. 

Once the threshold bitmaps corresponding to the two exposures have been com- 
puted, there are several ways to align them. One brute force approach is to test every 
offset within the allowed range, computing the XOR difference at each offset and 
taking the coordinate pair corresponding to the minimum difference. A more ef- 
ficient approach might follow a gradient descent to a local minimum, computing 
only local bitmap differences between the starting offset (0,0) and the nearest min- 
imum. We prefer a third method, based on an image pyramid that is as fast as 
gradient descent in most cases but more likely to find the global minimum within 
the allowed offset range. 

Multiscale techniques are well known in the computer vision and image- 
processing communities, and image pyramids are frequently used for registration 
and alignment. (See, for example, [127].) This technique starts by computing an 
image pyramid for each grayscale image exposure, with log, (max_offset) levels past 
the base resolution. The resulting MTBs are shown for two example exposures in 
Figure 4.6. For each smaller level in the pyramid, we take the previous grayscale 


4 Technically, the median value could change with changing boundaries as the camera moves, but such small changes in 
the median are usually swamped by noise, which is removed by this algorithm. 
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A pyramid of MTBs is used to align adjacent exposures one bit at a time. The 
smallest (rightmost) image pair corresponds to the most significant bit in the final offset. 
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image and filter it down by a factor of two in each dimension, computing the MTB 
from the grayscale result. The bitmaps themselves should not be subsampled, as the 
result will be subtly different and could potentially cause the algorithm to fail. 

To compute the overall offset for alignment, we start with the lowest-resolution 
MTB pair and compute the minimum difference offset between them within a range 
of +1 pixel in each dimension. At the next resolution level, we multiply this offset 
by 2 (corresponding to the change in resolution) and compute the minimum dif- 


ference offset within a +1 pixel range of this previous offset. This continues to the 
highest-resolution (original) MTB, where we get the final offset result. Thus, each 
level in the pyramid corresponds to a binary bit in the computed offset value. 

At each level, we need to compare exactly nine candidate MTB offsets, and the 
cost of this comparison is proportional to the size of the bitmaps. The total time 
required for alignment is thus linear with respect to the original image resolution 
and independent of the maximum offset, in that the registration step is linear in the 
number of pixels, and the additional pixels in an image pyramid are determined by 
the size of the source image and the (fixed) height of the pyramid. 


4.5.1 THRESHOLD NOISE 


The algorithm just described works well in images that have a fairly bimodal bright- 
ness distribution, but can run into trouble for exposures that have a large number of 
pixels near the median value. In such cases, the noise in near-median pixels shows 
up as noise in the MTB, which destabilizes the difference computations. 

The inset in Figure 4.7 shows a close-up of the pixels in the dark stairwell ex- 
posure MTB, which is representative of the type of noise seen in some images. 
Computing the XOR difference between exposures with large areas such as these 
yields noisy results that are unstable with respect to translation because the pix- 
els themselves tend to move around in different exposures. Fortunately, there is a 
straightforward solution to this problem. 

Because this problem involves pixels whose values are close to the threshold, 
these pixels can be excluded from our difference calculation with an exclusion bitmap. 
The exclusion bitmap consists of 0s wherever the grayscale value is within some 
specified distance of the threshold, and 1s elsewhere. The exclusion bitmap for the 
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Close-up detail of noisy area of MTB in dark stairwell exposure (full resolution). 


exposure in Figure 4.7 is shown in Figure 4.8, where all bits are zeroed for pixels 


within J 


t4 of the median value. 


We compute an exclusion bitmap for each exposure at each resolution level in 
the MTB pyramid, and then take the XOR difference result for each candidate offset, 


ANDing 
effect is 


it with both offset exclusion bitmaps to compute the final difference.” The 
to disregard differences that are less than the noise tolerance in our images. 


5 If we were to AND the exclusion bitmaps with the original MTBs before the XOR operation, we would inadvertently count 
disagreements about what was noise and what was not as actual pixel differences. 
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FIGURE 4.6 An exclusion bitmap, with zeroes (black) wherever pixels in our original image are 
within the noise tolerance of the median value. 


This is illustrated in Figure 4.9, which shows the XOR difference of the unaligned 
exposures before and after applying the exclusion bitmaps. By removing those pixels 
that are close to the median, the least-reliable bit positions in the smooth gradients 
are cleared but the high-confidence pixels near strong boundaries (such as the edges 
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The original XOR difference of the unaligned exposures (left), and with the two 
exclusion bitmaps ANDed into the result to reduce noise in the comparison (right). 


of the window and doorway) are preserved. Empirically, this optimization seems to 
be very effective in eliminating false minima in the offset search algorithm. 


4.5.2 OVERALL ALGORITHM 


The full algorithm with the exclusion operator is given in the recursive C function 
(GetExpShift), shown in Figure 4.10. This function takes two exposure im- 
ages, and determines how much to move the second exposure (1MG2) in x and 
y to align it with the first exposure (1Mg1). The maximum number of bits in the 
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GetExpShift (const Image *imgl, const Image *img2, 
int shift_bits, int shift_ret[2]) 
{ 
int min_err; 
int Cür SNIFELZ]Y 
Bitmap tbl, tb2; 
Bitmap ebl, eb2; 
int Tee 
if (shift_bits > 0) { 
age sml_imgl, sml_img2; 
ageShrink2(imgl, &sml_imgl); 
ageShrink2(img2, &sml_img2); 
GetExpShift(&sml_imgl, &sml_img2, shift_bits-1, cur_shift); 
ageFree(&sml_imgl); 
ageFree(&sml_img2); 
cur_shiftL0O] *= 2; 
cur_shift[1] *= 2; 
else 


} 


cur_shift[0] = cur_shift{1] = 0; 
ComputeBitmaps(imgl, &tbl, &ebl); 
ComputeBitmaps(img2, &tb2, &eb2); 
min_err = imgl->xres * imgl->yres; 
for (i = <14 i <= Le THF) 

for (j = -1; j <= 1; j++) { 

int xs = cur shifti] + i; 

int ys = cur_shift[1] + j; 

Bitmap shifted_tb2; 

Bitmap shifted_eb2; 

Bitmap diff_b; 

int err; 

BitmapNew(imgl->xres, imgl->yres, &shifted_tb2); 
BitmapNew(imgl->xres, imgl->yres, &shifted_eb2); 
BitmapNew(imgl->xres, imgl->yres, &diff_b); 
BitmapShift(&tb2, xs, ys, &shifted_tb2); 
BitmapShift(&eb2, xs, ys, &shifted_eb2); 


FIGURE 4.10 The GetExpShi ft algorithm. 
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BitmapXOR(&tbl, &shifted_tb2, &diff_b); 
BitmapAND(&diff_b, &ebl, &diff_b); 
BitmapAND(&diff_b, &shifted_eb2, &diff_b); 
err = BitmapTotal (&diff_b); 
if (err < min_err) { 
shift_ret[0] = xs; 
shift_ret[1] = ys; 
min_err = err; 
} 
BitmapFree(&shifted_tb2); 
BitmapFree(&shifted_eb2) ; 
} 
BitmapFree(&tbl); BitmapFree(&ebl); 
BitmapFree(&tb2); BitmapFree(&eb2) ; 


(Continued. ) 


final offsets is determined by the shi ft_bits parameter. The more important 
functions called by GetEXpShi ft are as follows. 


ImageShrink2 (const Image *img, Image *img_ret): Sub- 
sample the image img by a factor of two in each dimension and put the result 
into a newly allocated image img_ret. 


ComputeBitmaps (const Image *img, Bitmap *tb, Bitmap 
*eb ): Allocate and compute the threshold bitmap tb and the exclusion bitmap 
eb for the image img. (The threshold and tolerance to use are included in the 
Image struct.) 


BitmapShift (const Bitmap *bm, int xo, int yo, Bitmap 
*bm_ret): Shift a bitmap by (X0,y0) and put the result into the preallocated 
bitmap bm_ret, clearing exposed border areas to zero. 
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BitmapXOR (const Bitmap *bml, const Bitmap *bm2, Bit- 
map *bm_ret): Compute the XOR of bm1 and bm2 and put the result into 
bm_ret. 


Bitmaplotal (const Bitmap *bm): Compute the sum of all 1 bits in 
the bitmap. 


Computing the alignment offset between two adjacent exposures is simply a 
matter of calling the GEtEXpShift routine with the two image structs (img1 
and img2), which contain their respective threshold and tolerance values. (The 
threshold values must correspond to the same population percentiles in the two 
exposures.) We also specify the maximum number of bits allowed in the returned 
offset, Shi ft_bits. The shift results computed and returned in shift_ret 
will thus be restricted to a range of +25hiftbits, 

There is only one subtle point in this algorithm, which is what happens at the 
image boundaries. Unless proper care is taken, non-zero bits may inadvertently 
be shifted into the candidate image. These would then be counted as differences 
in the two exposures, which would be a mistake. It is therefore crucial that the 
BitmapShift function shifts 0s into the new image areas, so that applying the 
shifted exclusion bitmap to the XOR difference will clear these exposed edge pixels 
as well. This also explains why the maximum shift offset needs to be limited. In the 
case of an unbounded maximum shift offset, the lowest-difference solution will also 
have the least pixels in common between the two exposures (one exposure will end 
up shifted completely off the other). In practice, we have found a shift_bits 
limit of 6 (+64 pixels) to work fairly well most of the time. 


4.5.3 EFFICIENCY CONSIDERATIONS 


Clearly, the efficiency of the MTB alignment algorithm depends on the efficiency 
of the bitmap operations, as nine shift tests with six whole-image bitmap oper- 
ations apiece are performed. The Bi tmapXOR and BitmapAND operations are 
easy enough to implement, as we simply apply bitwise operations on 32-bit or 64- 
bit words, but the BitmapShift and BitmapTotal operators may not be as 
obvious. 

For the Bi tmapShi ft operator, any 2D shift in a bitmap image can be reduced 
toa 1D shift in the underlying bits, accompanied by a clear operation on one or two 
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edges for the exposed borders. Implementing a 1D shift of a bit array requires at 
most a left or right shift of B bits per word, with a reassignment of the underlying 
word positions. Clearing the borders then requires clearing words where sequences 
of 32 or 64 bits are contiguous, and partial clears of the remaining words. The 
overall cost of this operator, although greater than the XOR or AND operators, is 
still modest. This Bi tmapShi ft implementation includes an additional Boolean 
parameter that turns off border clearing. This optimizes the shifting of the threshold 
bitmaps, which have their borders cleared later by the exclusion bitmap, and thus 
the Bi tmapShi ft operator does not need to clear them. 

For the BitmapTotal operator, a table of 256 integers is computed corre- 
sponding to the number of 1 bits in the binary values from 0 to 255 (i.e., 0, 1, 1, 
2, 1, 2, 2, 3, 1, ..., 8). Each word of the bitmap can then be broken into chunks 
(measured in bytes), and used to look up the corresponding bit counts from the 
precomputed table. The bit counts are then summed to yield the correct total. This 
results in a speedup of at least 8 times over counting individual bits, and may be 
further accelerated by special-case checking for zero words, which occur frequently 
in this application. 


4.5.4 RESULTS 


Figure 4.11 shows the results of applying the MTB image alignment algorithm to 
all five exposures of the Italian stairwell, with detailed close-ups showing before 
and after alignment. The misalignment shown is typical of a handheld exposure 
sequence, requiring translation of several pixels on average to bring the exposures 
back atop each other. We have found that even tripod exposures sometimes need 
minor adjustments of a few pixels for optimal results. 

After applying this translational alignment algorithm to over 100 handheld ex- 
posure sequences, a success rate of about 84% was found, with 10% giving un- 
satisfactory results due to image rotation. About 3% failed due to excessive scene 
motion — usually waves or ripples on water that happened to be near the threshold 
value and moved between frames — and another 3% had too much high-frequency 
content, which made the MTB correspondences unstable. Most of the rotation fail- 
ures were mild, leaving at least a portion of the HDR image well aligned. Other 
failures were more dramatic, throwing alignment off to the point where it was 
better not to apply any translation at all. 
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An HDR image composited from unaligned exposures (left) and detail (top cen- 


ter). Exposures aligned with the MTB algorithm yield a superior composite (right) with clear 
details (bottom center). 


4.6 DERIVING THE CAMERA RESPONSE FUNCTION 


Combining LDR exposures into an HDR image requires knowledge of the camera 
response function to linearize the data. In general, the response function is not pro- 
vided by camera makers, who consider it part of their proprietary product differ- 
entiation. Assuming an sRGB response curve (as described in Chapter 2) is unwise, 
because most makers boost image contrast beyond the standard sRGB gamma to 
produce a livelier image. There is often some modification as well at the ends of the 
curves, to provide softer highlights and reduce noise visibility in shadows. However, 
as long as the response is not altered by the camera from one exposure to the next, 
it is possible to deduce this function given a proper image sequence. 
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4.6.1 DEBEVEC AND MALIK TECHNIQUE 


Debevec and Malik [18] demonstrated a simple and robust technique for deriving 
the camera response function from a series of aligned exposures, extending earlier 
work by Mann and Picard [5]. The essential idea is that by capturing different ex- 
posures of a static scene one is effectively sampling the camera response function at 
each pixel. This is best demonstrated graphically. 

Figure 4.12 shows three separate image positions sampled at five different expo- 
sures (Figure 4.13). The relative exposure ratios at each of the three positions are 


Log exposure (Ei * (A t)j) 
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Plot of g(Zij) from three pixels observed in five images, assuming unit radiance 
at each pixel. The images are shown in Figure 4.13. 


138 CHAPTER 04. HDR IMAGE CAPTURE 


Three sample positions over five exposures shown in Figure 4.12. 


given by the speed settings on the camera, and thus we know the shape of the re- 
sponse function at three different parts of the curve. However, we do not know how 
these three curve fragments fit together. Debevec and Malik resolved this problem 
using linear optimization to find a smooth curve that minimizes the mean-squared 
error over the derived response function. The objective function they use to derive 
the logarithmic response function g(Zj;) is as follows. 


N P 
O=} 9) {w(Zij) [s(Z;) — In Ei — In Atj]}? 


Zmax— | 


+2 > [w@e"l 


z=Zmintl 


There, At; is the exposure time for exposure j, E; is the film irradiance value at 
image position 7, and Z;; is the recorded pixel at position i and exposure j. The 
weighting function w(Z;;) is a simple hat function, as follows. 


woe z—Zmin forz< 5(Zmin + Zmax) 
Zmax—Z forz> 5(Zmin + Zmax) 
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This equation is solved using singular-value decomposition to obtain an optimal 
value for g(Z) for every possible value of Z, or 0 to 255 for an 8-bit image. Each 
of the RGB channels is treated separately, yielding three independent response func- 
tions. This assumes that interactions between the channels can be neglected. Al- 
though this assumption is difficult to defend from what we know about camera 
color transformations, it seems to work fairly well in practice. 

Figure 4.14 shows the result of aligning the three curves from Figure 4.12, and 
by applying the minimization technique to many image samples it is possible to 
obtain a smooth response function for each channel. 


Log exposure (Ei * (A t)j) 
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Normalized plot of g (Z;j) after determining pixel exposures. 
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4.6.2 MITSUNAGA AND NAYAR TECHNIQUE 


Mitsunaga and Nayar presented a similar approach, in which they derive a poly- 
nomial approximation to the response function [82] rather than the enumerated 
table of Debevec and Malik. The chief advantage they cite in their technique is the 
ability to resolve the exact exposure ratios in addition to the camera response func- 
tion. This proves important for lower-cost consumer equipment whose aperture and 
shutter speed may not be known exactly. Mitsunaga and Nayar define the following 
N-dimensional polynomial for their response function: 


N 


f(M) =)" cnm" 


n=0 


For consistency with Debevec and Malik, we suggest the following variable replace- 
ments: 


N> K 

n—>k 

M—>Z 

Q>P 

q4>j 
The final response function is thus defined by the N + 1 coefficients of this poly- 
nomial, {co, ...cy}. To determine these coefficients, they minimize the following 


error function for a given candidate exposure ratio, Rg,q+1 (the scale ratio between 
exposure q and q + 1): 


The minimum is found by determining where the partial derivatives with respect 
to the polynomial coefficients are all zero (i.e., solving the following system of 


4.6 DERIVING THE CAMERA RESPONSE FUNCTION 141 


N + 1 linear equations). 
de 


Cn 


=0 


As in previous methods, they only solve for the response up to some arbitrary scal- 
ing. By defining f(1) = 1, they reduce the dimensionality of their linear system by 
one coefficient, substituting 


N-1 


cn=1-) c. 


n=0 


The final N x N system can be written as follows. 


Q-1 P Q-1 P 
Dd 4p.q,0dp.q0-4p.g.N) + > 4p.4.04p,q,N=1 — 4p.q.N) 
q=1 psi q=1 pal 
Q-1 P O-1-P 
> Shannen —dy.q,N) tee dp.q,N-1(4p,q,N-1 —dp.q.N) 
q=1 pol q=1 p=1 
Q-1 P 
” > L > 4p.4,04p.4.N 
0 gal pal 
x = bas 
A Q-1 P 
N-1 
= D dp.q.N—14p,.q,N 
q=1 p=1 
Here, 


= n n 
dp.q,n = Ma = Ra.q+1 Mp gt: 


The original Mitsunaga and Nayar formulation only considers adjacent exposures. 
In practice, the system is more stable if all exposure combinations are considered. 
The error function becomes a triple sum by including a sum over q’ # q instead 
of just comparing q to q + 1. This then gets repeated in the sums of the combined 
system of equations, where dp,q,n is replaced by 
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To compute the actual exposure ratios between images, Mitsunaga and Nayar apply 
an interactive technique, where the previous system of equations is solved repeat- 
edly, and between each solution the exposure ratios are updated using the following. 


Iteration is complete when the polynomial is no longer changing significantly, as 
follows. 


|Om- Ff"? (M)| <2, YM 


This leaves just one final problem: What is the polynomial degree N? The authors 
recommend solving for every degree polynomial up to some maximum exponent 
(e.g., 10), accepting the solution with the smallest error, €. Fortunately, the solution 
process proceeds quickly and this is not much of a burden. It is a good idea to 
ensure that the same degree is selected for all color channels, and thus a combined 
£ function is preferable for this final test. 


4.6.3 CHOOSING IMAGE SAMPLES FOR RESPONSE 
RECOVERY 


Each of the techniques described for camera response recovery requires a set of 
intelligently selected samples from the exposure sequence. In principle, one could 
use every pixel from every image, but this would only add to the computation time 
while actually reducing stability in the solution due to misaligned and noisy data. 
Once the exposures have been aligned, the following procedure for selecting sample 
patches is recommended. 


1 Sort the exposures from lightest to darkest. 
2 Select an appropriate sample patch size and an optimal number of patches, 
and initialize (clear) the patch list. 
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3 Determine how many patches from the previous exposure are still valid for 
this one. 

4 Compute how many more patches are needed for this exposure. If none, go 
to the next exposure (Step 3). 

5 Search for valid patches using randomized rejection sampling. A valid patch 
is brighter than any of the previous exposure’s patches, does not overlap any 
other patch, and possesses a low internal variance. It is also within the valid 
range for this exposure. 

6 Once we have found enough patches or given up due to an excess of rejected 
samples, we continue to the next exposure (Step 3). 


A target of 50 12-by-12 pixel patches per exposure seems to work well. In cases 
where the darker exposures do not use their full range, it becomes difficult to find 
new patches that are brighter than the previous exposure. In practice, this does not 
affect the result significantly, but it is important for this reason to place a limit on 
the rejection sampling process in Step 5, lest we go into an infinite loop. 

Figure 4.15 shows an exposure sequence and the corresponding patch locations. 
Adjacent exposures have nearly the same patch samples, but no patch sample sur- 
vives in all exposures. This is due to the range restriction applied in Step 5 to avoid 
unreliable pixel values. Figure 4.16 shows a close-up of the middle exposure with 
the patches shown as boxes, demonstrating the low variance in the selected regions. 
By rejecting high-contrast areas, errors due to exposure misalignment and sensor 
noise are minimized. 

Finally, Figure 4.17 shows the recovered response function for this sequence fit- 
ted with a third-order polynomial using Mitsunaga and Nayar’s method, and com- 
pares it to the standard sRGB response function. The camera produces an artificially 
exaggerated contrast with deeper blacks on its LDR exposures. This type of response 
manipulation is fairly standard for consumer-grade cameras, and many professional 
SLRs as well. 


4.6.4 CAVEATS AND CALIBRATION 


To apply these techniques successfully, it helps to follow some additional guidelines, 
as follows. 
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Red squares indicate size and location of patch samples in each exposure. 


e Use aperture priority or manual exposure mode, so that only the exposure 
time is allowed to vary. This reduces problems associated with vignetting 
(light falloff toward the edge of the image). 

e Fix the camera’s white balance on a specific setting for the entire sequence, 
preferably daylight (i.e., Des). 

e Ifthe camera offers an “optimized color and contrast” mode, switch it off. 
The more settings you can fix manually, the less likely the camera will be 
altering the response function between exposures. This applies particularly 
to automatic ISO/ASA and programmed exposure modes. 
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Close-up of central exposure showing selected patch regions. 


e Use a tripod if possible, and control your camera via a tether to a laptop 
computer if this option is available. The less touching of the camera during 
a sequence the fewer alignment problems you will experience. 


In general, it works best to calibrate your camera’s response one time, and then reuse 
this calibration for later exposure sequences. In this way, the scene and exposure 
sequence may be optimized for camera response recovery. For such a sequence, 
perform the following. 


e Set the camera on a tripod and use a tether if available. Alignment may still be 
necessary between exposures if the camera is touched during the sequence, 
but the method described in the previous section will work far better with 
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Relative response 
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IGL Recovered red, green, and blue response functions for the image sequence shown in 
Figure 4.15. 


a tripod than a handheld sequence in which the exposure is being changed 
manually. 

e Choose a scene with large gray or white surfaces that provide continuous 
gradients for sampling. The closer your scene is to a neutral color the less 
likely color transforms in the camera will undermine the response recovery 
process. 

e Choose a scene with very bright and very dark areas, and then take a long 
sequence of exposures separated by 1 EV (a factor of two in exposure time). 
The darkest exposure should have no RGB values greater than 200 or so, and 
the lightest exposure should have no RGB values less than 20 or so. Do not 
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include an excess of exposures beyond this range, as it will do nothing to 
help with response recovery and may hurt. 

e If you have access to a luminance meter, take a reading on a gray card or 
uniform area in your scene to provide absolute response calibration. 


Once a camera has been characterized in this way, it is possible to combine handheld 
bracketed sequences that are too short to reliably recover the response function. 


4.7 GHOST REMOVAL 


Once the exposures are aligned with each other and the camera’s response curve is 
determined, we may safely combine the images (as described in Section 4.3). How- 
ever, if some person or object was moving during the image sequence acquisition 
they may appear as “ghosts” in the combined result, due to their multiple locations. 
The technique described by Kang et al. [63] attempts to address this problem dur- 
ing alignment by warping pixels according to local content, but even if this can be 
done correctly in the presence of people who change posture as well as position it 
still leaves the problem of filling in holes that were obstructed in some views but 
not in others. 

A simpler approach is based on the observation that each exposure in the se- 
quence is self-consistent, which means that we can simply choose one exposure or 
another in specific regions to obtain a ghost-free result. The HDR capacity may be 
lost within these selected regions, but as long as the ghosts are local and compact, 
the overall image will still capture the full range of light. 

Figure 4.18 shows an HDR image captured from a bracketed sequence of five 
exposures, excerpted in the left-hand side of the figure. People walking in and out 
of the temple result in a trail of ghosts appearing in the combined result. 

Fortunately, it is relatively easy to detect motion of this type in exposure se- 
quences. As the images are combined using the weighted average (described in 
Section 4.3), the weighted variance can be computed simultaneously at each pixel, 
shown in Figure 4.19. The weighted variance is defined as the weighted sum of 
squares at each pixel over the square of the weighted average, the quantity minus 1. 
(We compute these quantities separately for red, green, and blue channels and then 
take the maximum at each pixel.) In addition to the moving people, some variance 
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FIGURE A 18 Five exposures combined into a single HDR image, where people moving through 
the scene have caused ghosting in the result. 


is detected at high-contrast edges due to imperfections in the lens and sensor. These 
regions will usually be rejected by a minimum area constraint, but may cause false 
positives on small stationary objects. 

At this point, the variance image could simply be thresholded and a single ex- 
posure selected to substitute for all high-variance pixels. However, this would in- 
cur significant artifacts. First, parts of moving objects whose pixels happened to 
correspond to the background locally would break apart. Second, and more seri- 
ously, choosing a single exposure for all high-variance pixels would result in exces- 
sive information loss, as problem pixels may be found in very different brightness 
regions. 

The algorithm could be modified to pick the best exposure for each problem 
pixel, but this would create an even more serious breakup problem because dif- 
ferent parts of the same object will be better exposed in different frames, where 
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Variance computed at each pixel over our exposure sequence, showing where in- 
formation is changing unexpectedly due to movement. 


the object’s position is also different. It is therefore important to isolate separable, 
high-variance regions and choose the best exposure for each. Such a segmentation 
is shown in Figure 4.20. This segmentation is computed as follows. 


1 Reduce the variance image by a factor of 10 in each dimension to save com- 
putation. 

2 Compute the threshold bitmap where local variance is greater than 0.18. 

3 Smear the threshold bitmap around a radius of 3 pixels to cover edges and 
join adjacent ghost regions. 
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FIGURE 4.20 Segmented regions corresponding to isolated ghosts in our exposure sequence. Seg- 
ment colors were randomly selected. 


4 Compute a “background” bitmap segment from a union of contiguous 
(flood-filled) low-variance regions that cover at least 0.1% of the image. 

5 Identify “ghost” bitmap segments as disjoint flood-filled regions within the 
background segment, with each ghost also covering at least 0.1% of the 
image. 


In Figure 4.20, the tops of the two silhouetted poles were inadvertently picked, due 
to the high variance at their edges. In addition, a large region in which people where 
passing each other on the walk ended up as one segment (violet). Fortunately, this 
segment is contained in a similarly lighted region, and thus a single exposure is 
adequate to capture its dynamic range. The uneven edges of some regions indicate 
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FIGURE 4.2% The combined HDR result with ghosts removed. 


low-variance pixels, where we limit our ghost removal using linear interpolation as 
explained below. 

To choose which exposure to use in which region, a histogram is generated from 
the floating-point values for each ghost segment. We then consider the largest value 
after ignoring the top 2% as outliers and choose the longest exposure that includes 
this 2% maximum within its valid range. We apply the corresponding exposure 
multiplier for each ghost segment then linearly interpolate between this exposure 
and the original HDR result using each pixel’s variant as our mixing coefficient. 
This ensures that extremely low-variance pixels within an identified ghost segment 
are left unaltered. Figure 4.21 shows the combined result with ghosts removed. The 
final image is not perfect, and one man’s bare foot has been summarily amputated, 
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but the overall result is an improvement, and this technique is quick and relatively 
straightforward. 


4.8 LENS FLARE REMOVAL 


After eliminating motion between exposures, there may still be artifacts present 
due to the camera’s optics. Most digital cameras are equipped with optics that are 
consistent with the inherent limitations of 24-bit digital images. In other words, 
manufacturers generally do not expect more than two orders of magnitude to be 
captured in the final image, and thus certain parameters may be relaxed in the lens 
and sensor design relative to a 35-mm-film camera, for example. For an HDR cap- 
ture process, however, the limitations of the system’s optics are more apparent, even 
in a well-made digital camera. Small issues such as the thickness and finish of the 
aperture vanes can make a big difference in the distribution of light on the sensor. 
The quality of coatings on the lenses and the darkness and geometry of the interior 
surrounding the sensor also come into play. Overall, there are many components 
that affect the scattering of light in an image, and it is difficult or impossible to 
arrive at a single set of measurements that characterize the system and its depen- 
dencies. Therefore, we prefer a dynamic solution to the lens flare problem, based 
only on the captured image. 

Because it is important to keep all of the optical properties of the camera consis- 
tent during HDR capture, normally only the shutter speed should be manipulated 
between exposures. Thus, the actual distribution of light on the sensor plane never 
varies, only the length of time the sensor is exposed to it. Therefore, any flare ef- 
fects present in one exposure are present to the same degree in all exposures and 
will sum consistently into our HDR result. For this reason, there is no need to work 
on individual exposures, as it would only serve to increase our computational bur- 
den. The camera’s point spread function (PSF) is a physical measure of the system optics, 
and it may be characterized directly from the recorded radiances in an HDR image. 


4.8.1 THE POINT SPREAD FUNCTION 


The PSF as it is defined here is an idealized radially symmetric characterization of 
the light falloff surrounding a point of light in a perfectly dark surrounding. It 
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An isolated spot of light in a darkened environment for the purpose of measuring 
the point spread function of a camera. 


could be measured by making a pinhole in a piece of aluminum foil in front of 
a lightbulb in a box and photographing it in a completely dark environment, as 
shown in Figure 4.22. The edge of the hole ought to be perfectly sharp, but it 
generally is not. The spread of light around the hole corresponds to light scattered 
within the lens of the digital camera. 

This photograph of the pinhole could then be used to correct the combined HDR 
result for other photographs made with precisely the same lens settings — zoom 
and aperture. However, this procedure is a lot to expect of even the most meticulous 
photographer, and because lens flare also depends strongly on dust and oils that 
come and go over time it is not practical to maintain a set of calibrated PSFs for any 
but the most critical applications. 

However, there is a technique whereby the PSF may be approximated based on 
image content, as we will demonstrate with the HDR capture shown in Figure 4.23. 
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Our input test image for estimating lens flare using the same aperture and 
zoom as in Figure 4.22 (tone mapped with the histogram equalization operator, described in 
Section 7.2.8). 


Despite the fact that this image contains no localized bright spots, and hence no 
easily measured PSF, a reasonable estimate of lens flare may still be obtained. 

It is assumed that in the image there exist some dark pixels near very bright 
(or “hot”) pixels. To the extent this is true, it will be possible to estimate the PSF 
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for the camera.® It is also assumed that the lens flare is radially symmetric. This is 
admittedly a crude approximation, but required by the estimation procedure. Thus, 
the goal is to find and remove the radially symmetric component of flare. Streaks 
and other asymmetrical artifacts generated by the camera optics will remain. The 
automatic flare removal consists of the following steps. 


1 Compute two reduced-resolution HDR images: one in color and one in 
grayscale. Call these cr and cgr, respectively. 

2 Identify “hot” pixels in Icr, which are over some threshold. 

3 Draw annuli around each hot pixel to compute a least squares approximation 
to the PSF using the method described in the following section. 

4 Apply the PSF to remove flare from the final HDR image. 


Reducing the working resolution of our HDR image achieves a major speedup with- 
out significantly impacting the quality of the results, in that flare tends to be a dis- 
tributed phenomenon. A reduced image size of at most 128 pixels horizontally or 
vertically is sufficient. The threshold setting for Step 2 is not particularly impor- 
tant, but we have found a value of 1,000 times the minimum (reduced) pixel value 
to work well for most images. Of course, a different threshold is advisable if the 
minimum is zero. Steps 3 and 4 require some explanation, which we give in the 
following subsections. 


4.8.2 ESTIMATING THE PSF 


The PSF defines how light falls off around bright points in the image.’ To estimate 
the PSF, the minimum pixel values around all “hot” pixels in the image are mea- 
sured, thus arriving at a conservative estimate of the PSF. To do this, the potential 
contributions of all hot pixels at a certain distance from the darker (non-hot) pixels 
are summed to build up an estimate of the PSF from the corresponding minima, 
radius by radius. For example, Figure 4.24 shows an image with exactly three of 


6 If this assumption is false, and there are no dark pixels near sources, lens flare will probably go unnoticed and there is 
no need to remove it. 


7 Infact, it defines how light falls off around any point in the image, but only the bright points matter because the falloff is 
so dramatic. 
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An example image with exactly three bright pixels, surrounded by circles showing 
where their PSF influences overlap. Each PSF radius will contain exactly one minimum, marked 
with an X in this example. 


these super-bright pixels. The same radius is drawn around all three pixels, creating 
three overlapping circles. If the PSF were known a priori, we could compute the 
contribution of these hot pixels at this distance by multiplying the PSF, which is a 
function of radius, by each hot pixel value. Conversely, dividing the darker pixels 
around each circle by the circle’s center gives an upper bound of the PSF. 
Furthermore, the PSF at this distance cannot be greater than the minimum of 
all darker-pixel/hot-pixel ratios. Where the circles overlap, the sum of hot pixel 
contributions should be considered. In fact, a convenient approach is to sum all 
three hot pixel values around each circle in another grayscale image. (This example 
shows three distinct circles, but in general there are many hot pixels adjacent to 
each other, which create a great deal of overlap in the contributing annuli.) The PSF 
upper bound at that radius will then equal the minimum ratio of the darker pixels in 
the circles to their hot pixel sums (the point marked with an X in the example). This 
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technique extends directly to any number of hot pixels. The estimation procedure 
is as follow. 


1 For each radius we wish to consider: 

a Sum the hot pixel values into a radius range (annulus) in a separate 
grayscale image. 

b Find the minimum ratio of darker-pixel/hot-pixel sum for all annuli. 

2 If the minimum ratio is not less than the previous (smaller) radius, discard 
it (because we assume the PSF is monotonically decreasing). 

3 For each minimum ratio pixel, identified for each sample radius, consider all 
flare contributions to this pixel over the entire image as described below. 


Once we have an estimate of the upper limit of the PSF at each radius, these mini- 
mum pixels can be used to fit a third-degree polynomial, p(x), using the reciprocal 
of the input radius for x.8 For each identified minimum pixel position with value 
P;, we can write the following equation. 


Here, the Pjs are the contributing pixel values over the rest of the image, and the 
rjjs are the distances between the minimum pixel P; and each contributing pixel 
position. This equation can be rewritten as follows. 


P= Pt tay t+ay 4 
: AN) a 


j Jon j 4 j ij 


The sums in this equation then become coefficients in a linear system in which the 
four fitting parameters (Co through C3) are the unknowns. As long as there are 
more than four minimum pixel values, P;, it should be possible to solve this as an 
overdetermined system using standard least squares minimization. Heuristically, a 
better solution may be obtained if we assign minimum and maximum permitted 


8 The use of a third-degree polynomial and fitting to the reciprocal of distance are heuristic choices we have found to 
produce good results at an economical cost. 
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values for the distance between pixels, r;j. Anytime the actual distance is less than 
the minimum radius (3 pixels in the reduced image), we use a distance of 3, instead. 
Similarly, the distance is clamped to a maximum of half the image width. This 
avoids stability problems and sensitivity to local features in our image. It also avoids 
the possibly incorrect removal of flare too close to light sources. This is generally 
impossible anyway, in that flare from the lens can be so great that the underlying 
information is washed out. In such cases, no recovery can be made. This often 
happens at bright source boundaries. 

Figure 4.25 compares the point spread function measured in Figure 4.22 to our 
estimate derived solely from the input image in Figure 4.23. Other than the artificial 
plateau we imposed by constraining the minimum 1;; to 3 pixels, the two curves 
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Comparison between directly measured PSF from Figure 4.22 and function fitted 
using image in Figure 4.23. 
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are a reasonable match. The fitted function shows a slightly greater flare than the 
measured one, but this is explained by the fact that the measurement was based on 
a spot near the center of the image. Optical flare becomes more pronounced as one 
moves farther toward the edges of an image, especially in a wide-angle lens. Since 
the fitting function was applied over the entire image, we would expect the globally 
estimated PSF to be slightly greater than a PSF measured at the center. 


4.8.3 REMOVING THE PSF 


Given an estimate of the PSF, flare removal is straightforward. For each hot pixel in 
the image, we subtract the PSF times this pixel value from its surroundings. Because 
the neighborhood under consideration may extend all the way to the edge of the 
image, this can be an expensive operation. Once again, working with a reduced 
image lowers the computational cost to manageable levels. The steps for removing 
the PSF are as follows. 


1 Create a reduced-resolution flare image, Fcr, and initialize it to black. 

2 For each hot pixel in the reduced image Icg, multiply by the PSF and add the 
product into Fer. 

3 Ifthe value of any pixel in Fep is larger than its corresponding pixel in Ic, 
reduce the magnitude of Fecr uniformly to compensate. 

4 Upsample Fer using linear interpolation and subtract from the original HDR 
image. 


Step 3 ensures that no negative pixels are generated in the output and is necessary 
because the fitting method does not guarantee the most conservative PSF. Dependent 
on the interpolation and the local variance of the original pixels, we may still end 
up with negative values during Step 4 and should truncate these where they occur. 
An example result of automatic flare removal is shown in Figure 4.26, along with 
the reduced resolution flare image generated during Step 2. 


4.9 DIRECT CAPTURE OF HDR IMAGERY 


With the possible exception of lens flare removal, the techniques explained in the 
last section might be unnecessary if we had a digital sensor that could record the 
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An image of a rosette window, before flare removal (left) and after (center). The 
right-hand image shows the estimated PSF applied to hot pixels in the image. 


full dynamic range of a scene in a single shot. In fact, such sensors are being actively 
developed, and some are even being marketed, but only a few integrated solutions 
are commercially available: the Autobrite cameras from SMaL Camera Technologies, 
the SpheroCam HDR panoramic camera from SpheronVR, and the Ladybug spheri- 
cal camera from Point Grey Research. We will describe each of these systems briefly 
in this section. 


4.9.1 VIPER FILMSTREAM 


Grass Valley, a division of Thomson, introduced the Viper FilmStream camera for 
digital cinematography in the Fall of 2002 (www.thomsongrassvalley.com/ products /cameras/ 
viper/). This is currently the top-end performer for digital capture, and it produces 
an enormous amount of data (up to 444 Mbytes/sec!). The camera contains three 
HDTV 1080i (1,920 x 1,080-resolution) CCD sensors (one each for red, green, 
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Optical transfer curve VIPER at 3200K 


Log output signal [A.U] 


Log exposure [Lux.sec] 


The Viper FilmStream camera and its response curve. 


and blue) and records directly into a 10-bit/channel log format. The camera and its 
response functions are shown in Figure 4.27. 

This chart shows that the Viper captures about three orders of magnitude, which 
is at least 10 times that of a standard digital video camera, and begins to rival film. 
The equipment is currently available for lease from Thomson. 


4.9.2 SMaL 


SMaL Camera Technologies of Cambridge, Massachusetts (www.smalcamera.com), mar- 
kets a low-cost VGA-resolution CMOS sensor (the IM-001 Series) which is capa- 
ble of recording extended-range images at twice video rates (60 fps). Through its 
unique design, individual pixel sensitivities are adjusted so that the chip captures 
about twice the dynamic range (in log units) of a standard CCD or CMOS sensor, or 
about four orders of magnitude. They currently offer two products that incorporate 
their Autobrite (TM) technology, a credit-card-size still camera, and a video sur- 
veillance camera (a prototype is shown in Figure 1.9). They also market a “digital 


162 CHAPTER 04. HDR IMAGE CAPTURE 


3 An HDR image of one of the authors, captured using a SMaL Ultra-Pocket digital 
camera directly to RGBE format. As we can see in false color, the dynamic range captured in this 


image is about three orders of magnitude. 


imaging kit” to OEMs and system integrators who wish to incorporate the SMaL 
sensor in their products. 

Figure 4.28 shows an image captured using the SMaL Ultra-Pocket camera. Due 
to its limited resolution (482 x 642), the SMaL sensor is not well suited to seri- 
ous photographic applications, but this may change with the introduction of larger 
Autobrite arrays. Other aspects of chip performance, such as signal-to-noise ratio at 
each pixel and fill factor, may also affect the applicability of this technology. 


4.9.3 PIXIM 


Pixim of Mountain View, California (www.pixim.com), offers two 720 x 480 CMOS 
image sensors that boast a 10-bit digital video output with a 95-dB signal-to-noise 
ratio, corresponding to roughly four orders of magnitude. These sensors grew out 
of the Programmable Digital Camera Project headed by Abbas El Gamal and Brian 
Wandell at Stanford University, and employ “multisampling” on picture elements 
to minimize noise and saturation. Pixels are grouped with independent analog-to- 
digital converters (ADCs), and sampled multiple times during each video frame. 
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Conversion at each sensor group stops when either the frame time is up or the 
value nears saturation. In effect, each pixel group has its own electronic shutter 
and dynamic exposure system. For additional processing, the sensor chip is paired 
with a custom digital image processor, which handles video conversion and con- 
trol. Pixim currently markets the sensors and development kits to OEMs, and it has 
been picked up by a few security camera makers. Smartvue (www.smartvue.com) has 
based its S2 line of wireless surveillance cameras on the Pixim chip set, and Baxall 
(www.baxall.com) recently introduced its Hyper-D camera. 


4.9.4 SPHERONVR 


SpheronVR of Kaiserslautern, Germany (www.spheron.com), has what is undeniably the 
highest-resolution and highest-performance HDR camera in existence; the Sphero- 
Cam HDR. This device boasts the ability to capture full spherical panoramas at a 
resolution of up to 13,000 x 5,300 pixels, covering nearly eight orders of magni- 
tude in dynamic range (a 108:1 contrast ratio). However, because they use a line- 
scan CCD for their capture the process takes from 15 to 30 minutes to complete a 
full 360-degree scan at this resolution. Lower resolutions and dynamic ranges will 
scan faster, but one can never achieve a single-shot capture with a line-scan camera 
because the device must mechanically pan over the scene for its exposure. Never- 
theless, this is the system to beat for panoramic capture and critical image-based 
lighting applications, and their deluxe package comes with an advanced software 
suite as well. Figure 4.29 shows a SpheroCam HDR image captured in Napa Valley, 
California, at a resolution of about 3,000 x 2,100. The dynamic range is 5.5 orders 
of magnitude. 


4.9.5 POINT GREY RESEARCH 


Point Grey Research, of Vancouver, Canada (www.ptgrey.com), recently came out with 
an upgrade of the SDK for their LadyBug spherical video camera, which enables it 
to capture six perspective HDR images in a single shot. Five 1 /3-inch SVGA sensors 
(1,024 x 768) look in a circle of horizontal directions to obtain a panorama, and 
a sixth sensor looks straight up, yielding a final image that covers 75% of the full 
sphere. Data are delivered in real time via a firewire cable to a tethered host com- 
puter. Figure 4.30 shows an example HDR result with a resolution of 3,600 x 1,500 
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An HDR panorama captured by Spheron’s SpheroCam HDR line-scan camera, 


tone mapped using a histogram adjustment operator. 


and a dynamic range in excess of four orders of magnitude. Notably, there is no ev- 
idence of ghosting or smearing problems, which one would see if the image were 
multiply exposed. 


4.10 CONCLUSIONS 


In the not-too-distant future digital still and motion picture photography may 
become exclusively HDR. After all, traditional film photography has provided 
medium-dynamic-range capture for nearly a century, and professionals expect and 
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GL 3 An HDR panorama captured in a single shot using the six CCD sensors of Point 
Grey Research's LadyBug camera system. 


require this latitude during postproduction (i.e., printing). The current trend to- 
ward mixed reality in special effects is also driving the movie industry, which is 
increasingly digital, toward HDR. Advances in dynamic range will hit the profes- 
sional markets first and slowly trickle into the semiprofessional price range over a 
period of years. 

Unfortunately, consumers will continue to be limited to LDR digital cameras in 
the short term, as HDR equipment will be priced out of reach for some years to 
come. During this interim period, software algorithms such as those described in 
this chapter will be the most affordable way of obtaining and experimenting with 
HDR imagery, and applications will hopefully push the market forward. 


Display Devices 


Image output devices fall into two ma- 
jor categories: printing (or hardcopy) de- 
vices and display (or softcopy) devices. 
The image printing category includes tra- 
ditional ink presses, photographic printers, 
and dye-sublimation, thermal, laser, and 
ink-jet printers — any method for deposit- 
ing a passive image onto a 2D medium. 
Some of these devices are capable of pro- 
ducing transparencies, but most are used to produce reflective prints. The image 
display category includes traditional cathode-ray tubes (CRTs), LCD flat-panel dis- 
plays, and LCD and DLP projectors — any method for the interactive display of im- 
agery on a 2D interface. Most, but not all, display devices include an integrated light 
source, whereas printed output usually relies on ambient illumination. In general, 
hardcopy output is static and passive, and softcopy output is dynamic and active. 
The challenges for presenting HDR imagery within these two classes is quite differ- 
ent. We will look first at printing devices and then at interactive displays. 


5.1 HARDCOPY DEVICES 


The first image-duplication systems were hardcopy devices, going all the way back 
to Johann Gutenberg’s invention of movable type and oil-based inks for the print- 
ing press in the fifteenth century. This was truly a digital device, requiring dextrous 
fingers to place the letters and designs in frames for creating master plates. (Wood 
block printing dating back to eighth-century China was more of an engraving trans- 
fer process.) Hand presses eventually gave way to powered flatbed cylinder presses 
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in the 1800s, which are still used for many printing applications today. More sig- 
nificant to this discussion, the dawn of photography in the latter half of the same 
century opened a new horizon not only to the printing process but to what could 
in fact be printed. 

Significantly, the chemistry of black-and-white film (and later color negative 
stock) has been tailored to record HDR information. As discussed in Chapter 4, 
the photographic printing/enlargement process is where the original range of the 
negative is reduced to fit the constrained range of a standard reflection print. The 
additional depth in the shadowed and highlighted areas of the negative permit the 
photographer or the processing lab to perform adjustments to the image exposure 
a posteriori to optimize the final image. This was the original use of the term tone 
mapping, now recognized to be so important to computer graphics rendering [131]. 

Figure 5.1 shows a color negative of an HDR scene next to a typical LDR print. 
The false color image on the right shows that the range recorded by the negative 
is actually quite large (nearly four orders of magnitude), and some information in 
the shadows is lost during standard printing. Using dodge-and-burn techniques, 
a skilled darkroom specialist could bring these areas out in a handmade print. By 
scanning the full dynamic range of the negative, one could alternatively apply one 
of the latest digital tone-mapping operators to compress this information in an 
LDR output. This fits with the idea of storing a scene-referred image and applying 
device-dependent tone mapping prior to final output. (See Chapters 6 through 8 
on dynamic range reduction and tone-reproduction operators, for further informa- 
tion.) 


5.1.1 THE REFLECTION PRINT 


As implicitly illustrated in all the figures of this and every other book, reflective print 
media is inherently LDR. Two factors are responsible for this. First, the brightest 
pixel in a reflection print is dictated by the ambient lighting. This same ambient 
light illuminates the area around the print, which we can generally assume to be a 
medium color (midgray being 18% reflectance, but see footnote 4 in Chapter 2). 
Thus, even the whitest paper stock with a 90% reflectance is perhaps five times as 
bright as its surroundings. A typical specular highlight on a sunny day is 500 times 
as bright as its surroundings, and light sources can be even brighter. Would it be 
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A color photograph of an HDR scene. The negative shown on the left stores scene 
luminance with its native logarithmic response. The middle image shows an LDR print, whereas 
the right-hand image shows the actual range available from the negative. 


possible to represent these outstanding highlights in a reflection print? Early artists 
recognized this problem and added gilding to their paintings and manuscripts [40], 
but this would be unreliable (not to mention expensive) in a commercial print 
setting. 

The second limitation of the contrast of reflection prints is maximum absorption, 
which is generally no better than 99.5% for most dyes and pigments. Even if we 
had a perfectly absorbing ink, the surface of the print itself reflects enough light 
to undermine contrast in the deep-shadow regions. Unless the illumination and 
background are very carefully controlled, the best contrast one can hope for in a 
good viewing environment is about 100:1, and it is often much less. 

Figure 5.2 shows a density chart, where adjacent bars differ by roughly 11% 
(well above the visible difference threshold) and are spaced for optimum visibility. 
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A density chart, demonstrating that it is difficult to resolve differences at or below 
1% reflectance (—2.0 log10 density) in printed images. 


Even though we have given the image a black background in order to improve con- 
trast visibility, the steps become indistinguishable well before a log; 9(—2.0) density 
(1% reflectance). On an HDR display, these steps would be clearly visible all the 
way to the bottom of the chart. The fact that they are not demonstrates one of the 
inherent limitations of diffusely reflective media: LDR output. 


5.1.2 TRANSPARENT MEDIA 


Not all hardcopy media are reflective. Some media are transparent and are designed 
to be projected. The most obvious example is movie film, although 35-mm slide 
transparencies and overhead transparencies bear mention as well. Fundamentally, 
transparencies overcome the two major limitations of reflective media: ambient 
lighting and maximum density. Because transparencies rely on a controlled light 


syi HARDCOPY DEVICES 171 


source and optics for display, the ambient environment is under much tighter con- 
trol. Most transparencies are viewed in a darkened room, with a dark surround- 
ing. For maximum density, we are only limited by film chemistry and printing 
method as to how dark our transparency can get. Three orders of magnitude are 
regularly produced in practice, and there is no physical limit to the density that can 
be achieved. 

Are slides and movies really HDR? Not really. They certainly have more dynamic 
range than standard reflection prints — perhaps by as much as a factor of 10. How- 
ever, viewers prefer higher contrast for images with a dark surround [29], and thus 
manufacturers of film oblige by creating high-contrast films for projection. The sen- 
sitive dynamic range of slide transparency film is actually quite narrow — about two 
orders of magnitude at most. Professional photographers are well aware of this im- 
itation. It is imperative to get the exposure and lighting exactly right, or there is no 
advantage in shooting transparency film. Cinematographers have a little more room 
to move because they go through an additional transfer step in which the exposure 
can be adjusted, but the final print represents only a narrow range of luminances 
from the original scene. 

Although transparency film is not traditionally used as an HDR medium, it has 
this potential. Something as simple as a slide viewer with a powerful backlight 
could serve as a low-tech HDR display if there were some way of producing a 
suitable transparency for it. An example of such an approach is demonstrated in the 
following section. 


5.1.3 HDR STILL IMAGE VIEWER 


Figure 5.3 shows an HDR still-image viewer composed of three elements: a bright, 
uniform backlight, a pair of layered transparencies, and a set of wide-field stereo 
optics. The view mapping for the optics and the method of increasing dynamic 
range by layering transparencies are the two challenges faced [140]. The original 
prototype of this HDR viewer was created at the Lawrence Berkeley Laboratory in 
1995 to evaluate HDR tone-mapping operators, but it has only recently been put to 
this task [72]. In the configuration shown, the viewer provides a nearly 120-degree 
field of view, a maximum luminance of 5,000 cd/m?, and a dynamic range of 
over 10,000:1. It employs the Large Expanse Extra Perspective (LEEP) ARV-1 optics, 
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An HDR viewer relying on layered film transparencies. The transparency position 
is shown in red on the right-hand diagram. 


which were designed by Eric Howlett and used in the original NASA virtual reality 
experiments [26].* 

The LEEP ARV-1 optics use a hemispherical fisheye projection, wherein the distance 
from the center of the image is proportional to the sine of the eccentricity (i.e., the 
angle from the central view direction). In addition, the optics exhibit significant 
chromatic aberration, which will cause colored fringes at the edges of view. This 
was originally corrected for by a matched camera with chromatic aberration in the 
opposite direction, but because we seek to render our views on a computer we apply 
an equivalent correction during image preparation (the Ca() function, described in 
the following material). The image must be high resolution in order not to appear 
blurred in the viewer (we found a resolution of 800 dpi (2,048 x 2,048) to be the 
minimum). A 4-by-5 film recorder is essential in producing transparencies at this 
size and resolution. 


1 The ARV-1/diffuser assembly was obtained from Ulrecth Figge of Boston, MA. 
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A film recorder typically consists of a small slow-scan CRT with a white phos- 
phor, which is carefully scanned three times with each of three colored filters inter- 
posed between the CRT and a film camera with a macro lens. The process is slow 
and the equipment is increasingly rare, making the production of high-resolution 
transparencies a costly proposition. Because the LEEP optics require a 2.5-by-5-inch 
transparency pair, we must split the job into two 4-by-5 outputs, because film can- 
not be printed to its borders. Furthermore, due to the difficulty of controlling 
transparency exposures to achieve densities whereby the film response is highly 
nonlinear it is necessary to create two transparency layers per eye, doubling the cost 
again.” 

Figure 5.4 shows the method for splitting a single HDR image into two trans- 
parency layers, which will later be mounted one atop the other in the viewer. 
Because the same image separation is needed to drive the HDR softcopy displays 
(described in the next section), we explain the process here. The incoming image 
must be normalized such that the maximum pixel value is no greater than 1.0 (i.e., 
maximum transmission). First, the pixels in the original image are blurred, which 
circumvents the otherwise insurmountable problems of misregistration and paral- 
lax between the two layers. We use a Gaussian blur function to reduce the apparent 
resolution of the back image to roughly 32 x 32, although we have found that reso- 
lutions as low as 16 x 16 will work. We then take the square root to cut the original 
dynamic range of our back layer in half: This is the key to getting an HDR result, in 
that standard film recorders cannot handle more than an 8-bit/primary input file. 

By subsequently dividing this back layer into the original, we obtain the front 
image, which is passed through the Ca( ) function to correct for the aforementioned 
chromatic aberration. The Ca() function simply makes the red channel in the image 
1.5% larger than the blue, with green halfway between. By construction, the front 
layer will have enhanced edges that precisely compensate for the blurred back layer, 
as explained in material following. Because densities add in layered transparencies 
(ie., transmittances multiply), the original HDR view is reproduced almost per- 
fectly. 

Figure 5.5 demonstrates the recombination of image layers. By dividing our orig- 
inal image (reproduced on the right) by the blurred back image (shown on the left), 


2 The cost per image is about $50 U.S., and four images are required per view. 
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FIGURE 5.4 The process whereby the original fisheye image for one eye is split into two trans- 
parency layers, which when combined in the HDR viewer will reproduce the original luminances. 


we obtain by construction the foreground image necessary to recover the original. 
As long as the dynamic range of the foreground transparency is not exceeded, this 
result is guaranteed. This method is called dual modulation. 

However, even if the dynamic range of the front image is exceeded, the limi- 
tations of the human visual system help mask the artifacts. At the point where we 
overtax the capacity of the front image, a contrast on the order of 100:1, scattering 
in the eye makes it impossible to distinguish sharp boundaries. Figure 5.6 (left) 
shows the approximate point spread function of the human eye. Figure 5.6 (right) 
shows the desired and the reproduced image for ??? device such as this. Due to the 
blurring of the back image, there is some spillover at the edges of this high-contrast 
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© Demonstration of how a blurred background image recombines with a carefully 


constructed foreground image to reproduce the original. 


boundary. However, due to scattering in the eye, the human observer cannot see it. 
The bright central region effectively masks this error as an even greater amount of 
light spills over on the retina. 

The HDR transparency viewer described is an interesting device, as it demon- 
strates the feasibility of splitting the image into two layers that together produce 
an HDR view. However, its limitation to still imagery for a single observer makes 
it impractical for anything outside the laboratory. Even so, the same principles we 
have introduced here apply equally to HDR softcopy displays, particularly those de- 
veloped by Sunnybrook Technologies (discussed in the following section). 
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© The point spread function of the human eye (left) and its effect on the visibility of 
spillover at high-contrast boundaries in a dual modulator’s output (right). 


5.2 SOFTCOPY DEVICES 


For the purposes of discussion, we define a softcopy device as an electronic device 
that can be used in an interactive setting. This excludes movie film projectors that 
display in real time something whose preparation is far from it. This section there- 
fore focuses on the two most popular display technologies before we venture into 
some of the newer and less well-known devices. 


5.2.1 CATHODE-RAY TUBES AND LIQUID CRYSTAL 
DISPLAYS 


The first softcopy device was the cathode-ray tube (CRT), invented by German 
physicist Karl Ferdinand Braun in 1897. A CRT is a vacuum tube configured to 
dynamically control the aim, intensity, and focus of an electron beam, which strikes 
a phosphor-coated surface that converts the energy into photons. By depositing 
red, green, and blue phosphors in a tight matrix and scanning the display surface 
at 30 Hz or more, the eye can be fooled into believing it sees a continuous 2D 
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color image. Through these and other refinements, the CRT has held its place as the 
leading softcopy display device over 100 years later, making it the most successful 
and longest-lived electronics technology ever developed.? Only in the past decade 
has the liquid crystal display (LCD) begun to supplant a substantial portion of tra- 
ditionally CRT-based applications, and LCDs currently dominate today’s portable 
electronics market. 

A good part of the success of the CRT is its inherent simplicity, although a cen- 
tury of tinkering has brought many variations and tens of thousands of patents to 
the basic technology. By tracing an electron beam (usually a triple beam for RGB) 
across a fixed phosphor-coated-matrix, the actual number of electronic connections 
in a CRT is kept to a minimum. By comparison, an active-matrix LCD has an associ- 
ated circuit deposited on the glass by each pixel, which holds the current color and 
drives the liquid crystal. This adds up to millions of components on a single LCD 
display, with commensurate manufacturing costs and challenges (up to 40% of dis- 
plays off the assembly line are discarded due to “stuck” pixels and other problems). 
Even today there are only a handful of electronics makers capable of fabricating 
large active-matrix LCD screens, which other manufacturers then assemble into fi- 
nal products. 

Figure 5.7 compares the anatomy of a CRT pixel to that of an LCD. In a CRT, each 
pixel is scanned once per frame, and the phosphor’s gradual decay (coupled with the 
brain’s integration of flashed illumination faster than 60 Hz) makes the pixel appear 
as though it were constant. In an active-matrix LCD, the pixel is held constant by 
the combination of a capacitor and a thin-film transistor (TFT), which acts as a 
short-term memory circuit between refreshes. As we mentioned, this circuitry adds 
to the cost and complexity of the LCD relative to the CRT, although these costs will 
reach parity soon. When one considers the end-to-end cost of CRTs, it is seen that 
their additional bulk and weight create shipping, handling, and disposal difficulties 
far beyond those of LCDs (and most other replacement technologies). LCDs have 
already surpassed CRT sales in the computer display market and are poised to take 
over the television market next. 

Regarding dynamic range, CRTs and LCDs have some important differences. The 
fundamental constraint for CRTs is their maximum brightness, which is limited 


3 Technically, the battery has been in use longer, but the battery does not fit within the standard definition of “electronics,” 
which is the behavior of free electrons in vacuum, gasses, and semiconductors. 
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by the amount of energy we can safely deposit on a phosphorescent pixel without 
damaging it or generating unsafe quantities of X-ray radiation. By comparison, 
there is no fundamental limit to the amount of light one can pass through an LCD 
screen, and in fact the LCD itself need not change (only the backlight source). 
However, CRTs have one advantage over standard LCDs, which is that a CRT pixel 
can be switched off completely, whereas an LCD pixel will always leak some small 
but significant quantity of light (limiting its effective dynamic range). Technically, 
a CRT display has a very high dynamic range, but it is not useful to us because the 
range is all at the low end, where we cannot see it under normal viewing conditions. 
Conversely, the LCD can achieve high brightness, but with a limited dynamic range. 

The only way to improve the dynamic range of an LCD is to modulate the back- 
light. Because most LCD backlights are uniform sources, one can only alter the 
overall output of the display in such a configuration. Of course, uniform modula- 
tion would not improve the dynamic range for a single frame or image, but over a 
sequence of frames one could achieve any dynamic range one desires. Indeed, some 
manufacturers appear to have implemented such an idea, and there is even a patent 
on it. However, having a video get drastically brighter and dimmer over time does 
not fulfill the need for additional dynamic range within a single frame. This gives 
rise to alternative technologies for providing local LCD backlight modulation. Two 
such approaches are described in the following. 


5.2.2 SUNNYBROOK TECHNOLOGIES’ HDR DISPLAYS 


Sunnybrook Technologies of Vancouver, Canada (www.sunnybrooktech.com), has ex- 
plored both projector-based and light-emitting diode (LED)-based backlight mod- 
ulators in its HDR display systems [114,115]. Similar to the concept presented in 
Section 5.1.3, a low-resolution modulator is coupled with a compensated high- 
resolution front image (the LCD) to provide an HDR display free of pixel registra- 
tion problems. The principal difference is that the Sunnybrook displays are dynamic 
and can show video at real-time rates. As these are otherwise conventionally config- 
ured displays, they have the external appearance of a standard monitor and unlike 
the original 2B transparency viewer are not restricted to a single observer. 

A diagram of Sunnybrook’s projector-based display is shown in Figure 5.8. The 
original prototype employed an LCD-based projector, and the later models use a 
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Sunnybrook Technologies’ projector-based display [114]. 


modified black-and-white DLP projector (see Section 5.2.3). All units employ a 
high-resolution LCD panel as the final view portion of their display. By using the 
projector to effectively modulate the backlight of the front LCD, they are able to 
present images with a dynamic range in excess of 50,000:1. Depending on the 
exact configuration, the maximum luminance can be up to 2,700 cd/m? for a 
single observer —at least 8 times brighter than today’s standard LCD displays, and 
15 times brighter than the best CRTs. 

However, there are a number of important drawbacks to using a projector as 
a backlight. First, the optical path required by the projector means that the display 
itself is large — about 100 cm in depth. Custom optics or some mirror arrangement 
could reduce this dimension, but similar to a projection-based television it will 
never be as small as a CRT display of similar display area and resolution. Second, the 
incorporation of a Fresnel lens to boost brightness and improve uniformity incurs 
a cost in terms of light falloff at wider viewing angles.* Finally, the light source 
for the projector must be extremely bright in order to support a high maximum 


4 The Fresnel lens is a thick sheet of acrylic, embossed with a circular pattern that simulates a much thicker lens. This is 
preferred in applications where cost and weight are more important than image quality. Because the rear image is low 
resolution and the Fresnel lens is followed by a diffuser, this arrangement has no impact on image quality. 
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FIGURE 5.9 Sunnybrook Technologies’ LED-based display [114]. 


luminance through two modulators, and this translates to a high (and unvarying) 
power consumption with associated heat dissipation issues. 

In consideration of these problems, Sunnybrook subsequently developed the 
LED-based display shown in Figure 5.9. Replacing the projector as a backlight, this 
newer display employs a low-resolution honeycomb (hexagonal) array of white 
LEDs mounted directly behind the LCD’s diffuser. No Fresnel lens is needed to com- 
pensate for projector beam spread, and because the LEDs are individually powered 
consumption is no longer constant but is directly related to display output. Because 
most HDR images will have only a fraction of very bright pixels (less than 10%), 
the average power consumption of this device is on par with a standard CRT display. 
Furthermore, because the LED array is inherently low resolution, Sunnybrook is able 
to encode the data needed in the first scan line of the incoming video signal, rather 
than providing a separate video feed as required by the projector-based display. 

The LED-based display has a higher maximum output (8,500 cd/m’), with a 
similar dynamic range. The chief drawback of this new design is the current cost 
of the high-output white LEDs used in the backlight. Fortunately, the cost of these 
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relatively new components is dropping rapidly as the market ramps up, and the 
price point is expected to be in the reasonable range by the time the display is 
ready for market. In contrast, the cost of high-output digital projectors has largely 
leveled off, and the price of the projector-based display will always be greater than 
the projector inside it. 


5.2.3 OTHER DISPLAY TECHNOLOGIES 


Most other work on HDR display technology is happening in the nascent field of 
digital cinema, whereby major studios and theater chains are hoping to replace their 
current film-based equipment with electronic alternatives. Already, over a hundred 
theaters in the United States have installed digital projection systems. Most of these 
projectors use the Texas Instruments Digital Light Processing (DLP) system, based 
on their patented Digital Micromirror Device (DMD). 

These devices were the first large-scale commercial application of microelectro- 
mechanical systems (MEMS). A DMD chip consists of a small high-resolution array 
of electrically-controlled two-position mirrors, a subsection of which is pictured in 
Figure 5.10. Each mirror is individually controlled and held in position by an un- 
derlying circuit, similar to that of an active-matrix LCD. The chief difference is that 
rather than transmitting a percentage of the light and absorbing the rest, the DMD 
reflects about 85% of the incident radiation, but in a controlled way that permits 
the desired fraction to continue onto the screen and the rest to be deflected by 10 to 
12 degrees onto an absorbing baffle. Thus, the DMD can handle much greater light 
intensities without risk of overheating or light-associated damage, despite its small 
area. Because it is inherently a binary device, time modulation is used to control the 
average output at each pixel. For example, a micromirror at 25% output is in the 
“off” orientation 75% of the time. Color is achieved either by ganging three chips 
through a beam splitter or by using a flying color wheel whereby red, green, and 
blue images are presented sequentially to the screen. This is all made possible by the 
fast switching times of the micromirror elements (about 15 microseconds). 

In principle, there is no reason to believe that DMD technology would not enable 
direct HDR display. In practice, however, the dynamic range is limited by the amount 
of light scattering from mirror edges, hinges, and the spacing required for clearance. 
Hence, the actual, delivered dynamic range of commercial DLP chips is on the order 
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FICURE 5.40 Twelve pixels on a Texas Instruments micromirror (DMD) array, and a section 
detailing two neighboring pixels. Each square mirror is 16 microns on a side, and DMD resolutions 
up to 1,024 x 768 are available. 


of 500:1, despite some manufacturers’ more optimistic claims (usually based on 
“all-on” versus “all-off” measurements). With time, we can hope that this ratio 
will continue to improve, and Texas Instruments, latest DDR DMD chips employ a 
dark inner coating to minimize unwanted reflections. However, there appear to be 
practical limits to how far DMD technology can go. 

An even more promising projection technology, which has been on the horizon 
for some years now, is Silicon Light Machines’ grating light valve (GLV), shown in 
Figure 5.11. This MEMS device provides rapid and efficient control of laser reflection 
via a tiny, controllable diffraction grating. Similar to the DMD in concept, the GLV 
uses smaller-scale elements (a few microns wide), with displacements smaller than 
the wavelength of visible light. This yields rapid, continuous control (about 0.1 
microseconds from 0 to 100%) between mirror and diffraction grating in what is 
inherently an analog device. Although no commercial displays are yet available using 
this technology, the design trend is toward vertical (column) arrays, swept across the 
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Silicon Light Machines’ micrograting (GLV) pixel. Each ribbon is about 5 mi- 
crons wide. 


screen to make a complete image using laser scanning or similar techniques [14]. It 
is difficult to predict what to expect of these devices in terms of dynamic range, but 
there are several parameters available for tuning and few apparent limitations to the 
ultimate control that may be achieved [97]. Operating at wavelength scales provides 
new opportunities for control efficiency that larger devices such as LCDs and DMDs 
cannot easily match. 

Finally, the HDR potential of LED-based displays bears mentioning. An active 
matrix of LEDs would solve nearly all of the problems discussed so far. Outputs 
could go down to zero at each pixel, potentially generous maximum outputs would 
be possible, and cross-talk between pixels would be negligible. 

Unfortunately, the technical barriers involved in constructing such a display are 
formidable. The large-scale microcircuit fabrication requirements are similar to 
those of LCD panels, except that the power levels are greater by several orders of 
magnitude. In addition, the color and output variation in current LED manufactur- 
ing is high, causing makers to rely on “binning” individual LEDs into groups for 
consistency. It is difficult to see how binning could be used in the manufacture of 
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a display with over a million such devices, but manufacturing methods continue to 
improve, and we expect that production will be more consistent in a few years. 

However, heat dissipation is critical, as LED output is very sensitive to temper- 
ature and efficacies are too low at present for a practical large HDR display. So far, 
only Kodak and Sony have marketed products using organic light-emitting diode 
(OLED) displays, and these are comparatively small, low-output devices. Never- 
theless, because LED displays are inherently HDR, the potential is there. 


5 Kodak's NUVUE AM550L device is 44 x 33 mm? at 520 x 220 resolution, with a 120-cd/m? maximum output level. 


The Human Visual System 
and HDR Tone Mapping 


The dynamic range of illumination in a 
real-world scene is high — on the order of 
10,000 to 1 from highlights to shadows, 
and higher if light sources are directly vis- 
ible. A much larger range of illumination 
can also occur if the scene includes both an 
outdoor area illuminated by sunlight and 
an indoor area illuminated by interior light 
(see, for example, Figure 6.1). Using tech- 
niques discussed in Chapter 4, we are able to capture this dynamic range with 
full precision. Unfortunately, most display devices and display media available to us 
come with only a moderate absolute output level and a useful dynamic range of less 
than 100 to 1. The discrepancy between the wide ranges of illumination that can 
be captured and the small ranges that can be reproduced by existing displays makes 
the accurate display of the images of the captured scene difficult. This is the HDR 
display problem, or HDR tone-mapping problem. We introduce the tone-mapping 
problem in this chapter and discuss individual solutions in detail in the following 
two chapters. 


6.1 TONE-MAPPING PROBLEM 


For a display to exhibit realism, the images should be faithful visual representations 
of the scenes they depict. This is not a new problem. Artists and photographers 
have been addressing this problem for a long time. The core problem for the artist 
(canvas), photographer (positive print), and us (display device) is that the light 
intensity level in the environment may be completely beyond the output level re- 
produced by the display medium. In addition, the contrast experienced in a real 
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Image depicting both indoor and outdoor areas. The different lighting conditions 
in these areas gives rise to an HDR. (Image courtesy of the Albin Polasek Museum, Winter Park, 
Florida.) 


environment may greatly exceed the contrast range that can be reproduced by those 
display devices. Appearance of a scene depends upon the level of illumination and 
the contrast range [31]. Some commonly noticed examples are that scenes appear 
more colorful and contrasty on a sunny day, colorful scenes of the day appear gray 
during night, and moonlight has a bluish appearance. Hence, simple scaling or 
compression of the intensity level and the contrast range to fit them into the dis- 
play limits is not sufficient to reproduce the accurate visual appearance of the scene. 
Tumblin and Rushmeier [131] formally introduced this problem and suggested the 
use of visual models for solving this problem (see Figure 6.2 for a pictorial outline). 
Ever since, developing tone-mapping algorithms that incorporate visual models has 
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> Pictorial outline of the tone-mapping problem. The ultimate goal is a visual match 
between the observed scene and the tone-mapped version of the captured HDR image on the display. 


been an active area of research within the computer graphics and digital-imaging 
communities. 

Reproducing the visual appearance is the ultimate goal in tone mapping. How- 
ever, defining and quantifying visual appearance itself is not easy and is a current 
research topic [31]. Instead of delving deep into appearance-related issues in tone 
mapping, in this chapter we address one basic issue for realistic display of HDR im- 
ages. First, the HDR must be reduced to fit the display range. This can be achieved 
by simple scaling of the image. However, such simple scaling often generates im- 
ages with complete loss of detail (contrast) in the resulting display (Figure 6.3). 
That gives us a seemingly simpler problem to solve: how to compress the dynamic 
range of the HDR image to fit into the display range while preserving detail. 

The human visual system deals with a similar problem on a regular basis. The 
signal-to-noise ratio of individual channels in the visual pathway (from retina to 
brain) is about 32 to 1, less than 2 orders of magnitude [19,55]. Even with this 
dynamic range limitation, the human visual system functions well: it allows us 
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HDR image depicting both indoor and outdoor areas. Linear scaling was applied 
to demonstrate the lack of detail afforded by linear scaling. (Image courtesy of the Albin Polasek 
Museum, Winter Park, Florida.) 


to function under a wide range of illumination, and allows us to simultaneously 
perceive the detailed contrast in both the light and dark parts of an HDR scene. 
Thus, if the goal is to match this perceived realism in the display of HDR images it 


is important to understand some of the basics of the human visual system. Hence, 
this chapter focuses on aspects of the human visual system relevant to HDR imaging. 
We show that most tone-mapping algorithms currently available make use of one 
of a small number of visual models to solve the HDR problem. 

The material described in the following sections has been distilled from the psy- 
chophysics and electrophysiology literature, wherein light is variously measured as 


quanta, intensity, luminance, radiance, or retinal illuminance. To avoid confusion, 
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wherever possible we will use the term luminance. If any unit other than luminance 
is required, we provide the units in which they originally appeared in the literature. 


6.2 HUMAN VISUAL ADAPTATION 


A striking feature of the human visual system is its capacity to function over the 
huge range of illumination it encounters during the course of a day. Sunlight can be 
as much as a million times more intense than moonlight. The intensity of starlight 
can be one-thousandth of the intensity of moonlight. Thus, the effective range of 
illumination is more than a billion to one [135]. The dynamic range simultaneously 
available in a single scene at a given time is much smaller, but still hovers at about 
four orders of magnitude. 

The visual system functions in this range by adapting to the prevailing conditions 
of illumination. Thus, adaptation renders our visual system less sensitive in daylight 
and more sensitive at night. For example, car headlights that let drivers drive at 
night go largely unnoticed in daylight, as shown in Figure 6.4. 

In psychophysical studies, human visual adaptation is evaluated by measuring 
the minimum amount of incremental light by which an observer distinguishes a test 


Although the headlights are on in both images, during daylight our eyes are less 


sensitive to car headlights than at night. 
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object from the background light. This minimum increment is called a visual threshold 
or just-noticeable difference (JND). In a typical threshold measurement experiment, a 
human subject stares at a wide blank screen for a sufficient amount of time to 
adjust to its uniform background intensity, Jp. Against this uniform background a 
small test spot of intensity J, + AJ is flashed. This test spot is called the stimulus. 
The increment AJ is adjusted to find the smallest detectable AM. The value of 
this threshold depends on the value of the background, as shown in Figure 6.5. 
This figure plots typical threshold versus intensity (TVI) measurements at various 


Threshold versus intensity (TVI) relation. The plot on the right shows the visual 
threshold ATp at different background intensities Ip. 
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background intensities. Over much of the background intensity range, the ratio 


Alb 
Ty 


is nearly constant, a relation known for over 140 years as Weber’s law. The value of 
this constant fraction is about 1% [135], which can vary with the size of the test 
spot and the duration for which the stimulus is shown. The constant nature of this 
fraction suggests that visual adaptation acts as a normalizer, scaling scene intensities 
to preserve our ability to sense contrasts within scenes. 

Visual adaptation to varying conditions of illumination is thought to be possible 
through the coordinated action of the pupil, the rod-cone system, photochemi- 
cal reactions, and photoreceptor mechanisms. The role of each of these factors is 
discussed in the following sections. 


6.2.1 THE PUPIL 


After passing through the cornea and the aqueous humor, light enters into the visual 
system through the pupil, a circular hole in the iris (Figure 6.6) [38,48,49]. One of 
the mechanisms that allows us to adapt to a specific lighting condition is regulation 
of the amount of light that enters the eye via the size of the opening of the pupil. In 
fact, the pupil changes its size in response to the background light level. Its diameter 
changes from a minimum of about 2 mm in bright light to a maximum of about 
8 mm in darkness. This change accounts for a reduction in light intensity entering 
the eye by only a factor of 16 (about 1 log unit). In a range of about 10 billion to 1, 
the intensity regulation by a factor of 16 is not very significant. Hence, the pupil’s 
role in visual adaptation may be ignored for the purpose of tone reproduction. 


6.2.2 THE ROD AND CONE SYSTEMS 


Light that has passed through the pupil travels through the lens and the vitreous 
body before reaching the retina, where it is reflected from a pigmented layer of 
cells before being absorbed by photoreceptors. The latter convert light into neural 
signals before they are relayed to other parts of the visual system. The human retina 
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» Schematic diagram of the human eye and its various components. (Image courtesy 
of Karen Lefohn.) 


has two distinct types of photoreceptors: rods and cones. Rods are very sensitive to 
light and are responsible for vision from twilight illumination to very dark light- 
ing conditions. Cones are relatively less sensitive and are responsible for vision in 
daylight to moonlight. Depending on whether the vision is mediated by cones or 
rods, illumination is broadly divided respectively into photopic and scotopic ranges. The 
boundary between photopic and scotopic is fuzzy, with some overlap occurring. 
A range of illumination between indoor light to moonlight in which both rods and 
cones are active is referred to as the mesopic range. Rods and cones divide the huge 
range of illumination into approximately two smaller ranges, and individually adapt 
to this range. 
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TVI curves for rods and cones. 


The manifestation of adaptation of rods and cones in their respective ranges of 
illumination is shown in the TVI plots in Figure 6.7. The solid line corresponds to 
the thresholds for rods, and the dashed line corresponds to the threshold for cones. 
In scotopic illumination conditions, rods are more sensitive than cones and have a 
much lower threshold, and the vision in those illumination conditions is mediated 
by the rod system. 

As illumination is increased, cones become increasingly sensitive (demonstrated 
by the crossover of the cone and rod TVI curves). At higher illuminations, rods 
begin to saturate and eventually the rod system becomes incapable of discrimi- 
nating between two lights that differ in intensity by as much as a factor of one 
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hundred [51]. The equation of the rod curve shown in Figure 6.7 is 


Af, = 0.1(5 + 0.015), 
and the equation describing the cone curve is 


Aly = 0.02(h + 8). 


These equations are due to Rushton and MacLeod and fit their threshold data [111]. 
In this case, the equations are given in trolands (td), which is a measure of retinal 
illuminance. A value of 1 td is obtained when a surface with a luminance of 1 cd/m? 
is viewed through a pupil opening of 1 mm”. Thus, trolands are given as luminance 
times area of the pupil. 

The diameter d of the circular pupil as a function of background luminance 
may be estimated [149]. Moon and Spencer [83] propose the following relation 
between luminance and pupil diameter. 


d = 4.9 — 3tanh(0.4(log L + 1.0)) 
Alternatively, de Groot and Gebhard [89] estimate the pupil diameter to be 


logd = 0.8558 — 4.01 x 1074 (log L + 8.6)°. 


In both of these equations the diameter d is given in mm, and L is the luminance 
in cd/m’. 

The role of rods and cones in adaptation is important, and deserves consideration 
when dealing with intensities of extreme dynamic range. However, the individual 
operating ranges of rods and cones are still very large (a million to one). Thus, 
additional processes must play a significant role in their adaptation. 


6.2.3 PHOTO-PIGMENT DEPLETION AND 
REGENERATION 


Light is absorbed by the rod and cone photoreceptors through a photochemical 
reaction. This reaction breaks down photosensitive pigments and temporarily ren- 
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ders them insensitive —a process called bleaching. The pigments are regenerated 
in a relatively slow process. Thus the visual adaptation as a function of light inten- 
sity could be attributed to the depletion and regeneration of photo-pigment. Rod 
photo-pigments are completely depleted when exposed to light intensity above the 
mesopic range. It is believed that this depletion renders rods inoperable in the pho- 
topic range. 

However, cone photo-pigments are not significantly depleted even in bright sun- 
light, but as demonstrated in the TVI relationship the sensitivity of the cones con- 
tinues to diminish as a function of background intensity. This lack of correlation 
between photo-pigment concentration and visual sensitivity, as well as other exper- 
imental evidence, suggests that unless virtually all pigments are bleached the visual 
adaptation to different illumination conditions cannot be completely attributed to 
photo-pigment concentration [19]. 


6.2.4 PHOTORECEPTOR MECHANISMS 


Photoreceptors convert absorbed light energy into neural responses. Intercellular 
recordings show that the response characteristics of rods and cones have the follow- 
ing behavior.t Compared to the broad range of background light intensities over 
which the visual system performs, photoreceptors respond linearly to a rather nar- 
row range of intensities. This range is only about 3 log units, as shown in Figure 6.8. 
The log-linear plot in this figure of the intensity-response function is derived from 
measurements of the response of dark-adapted vertebrate rod cells on brief expo- 
sures to various intensities of light [19]. 

The response curve of cones follows the same shape as the response curve of rod 
cells. However, because of the higher sensitivity of rod cells to light the response 
curve for the cones appears to the right on the log intensity axis. Figure 6.9 shows 
the response curves for both rods and cones. 


1 Electrophysiology is a field of study that may be used to detect the response of individual cells in the human visual 
system. Whereas the visual system is stimulated with a pattern of light, single-cell recordings are made whereby a thin 
electrode is held near the cell (extracellular recordings) or inside the cell (intercellular recordings), thus measuring the 
cell’s electrical behavior [92]. 
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Response of dark-adapted vertebrate rod cells to various intensities. The intensity 
axis in the image is shown in arbitrary units [19]. 


The response curves for both rods and cones can be fitted with the following 
equation. 


R I” 


Rmax 7 "+o" 


(6.1) 


Here, R is the photoreceptor response (0 < R < Rmax), Rmax is the maximum 
response, J is light intensity, and ø is the semisaturation constant (the intensity that 
causes the half-maximum response). Finally, n is a sensitivity control exponent that 
has a value generally between 0.7 and 1.0 [19]. 

This equation, known as the Michaelis-Menten equation (or Naka—Rushton 
equation), models an S-shaped function (on a log-linear plot) that appears re- 
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Response of dark-adapted rod and cone cells to various intensities in arbitrary units. 


peatedly in both psychophysical experiments [1,50,134,147] and widely diverse 
direct-neural measurements [19,41,44,65,87,133]. The role of o in Equation 6.1 
is to control the position of the response curve on the (horizontal) intensity axis. 
It is thus possible to represent the response curves of rods and cones shown in 
Figure 6.9 by simply using two different values of o, say Oroq and Ocone, in Equa- 
tion 6.1. 


Photoreceptor Adaptation The response curves shown in Figures 6.8 and 6.9 
demonstrate that when the dark-adapted photoreceptor is exposed to a brief light of 
moderately high intensity the response reaches its maximum and the photoreceptor 
is saturated. The photoreceptor loses sensitivity to any additional light intensity. 
This initial saturation of the photoreceptor matches with our visual experience of 
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blinding brightness when exposed to light about a hundred or more times more 
intense than the current background intensity. However, this initial experience does 
not continue for long. If exposed to this high background intensity for a while, 
the human visual system adapts to this new environment and we start to function 
normally again. 

Measurements have shown that if photoreceptors are exposed continuously to 
high background intensities the initial saturated response does not continue to re- 
main saturated. The response gradually returns toward the dark-adapted resting re- 
sponse, and the photoreceptor’s sensitivity to incremental responses is gradually 
restored. Figure 6.10 shows the downward shift in the measured response at two 
different background intensities (shown in vertical lines). An interesting observa- 
tion is that the response never completely returns to the resting response. Rather, 
it stabilizes on a plateau. Figure 6.10 shows the plateau curve (lower curve) for 
a range of background intensities. In addition to the restoration of sensitivity, the 
intensity-response curve measured at any given background intensity shows a right 
shift of the response-intensity curve along the horizontal axis, thus shifting the nar- 
row response range to lie around the background intensity. The shifted curves are 
shown in Figure 6.11. 

Independent measurements have verified that the shapes of the intensity- 
response curves at any background are independent of the background. However, 
with background intensity the position of the response function shifts horizon- 
tally along the intensity axis. This shift indicates that given sufficient time to adapt 
the visual system always maintains its log-linear property for about 3 log units of 
intensity range around any background. This shift is also modeled by the Michaelis— 
Menten equation by simply increasing the value of the semisaturation constant o as 
a function of the background intensity. This yields the modified equation 


R I” 
= Sa (6.2) 
Rmax I” +o; 


where op is the value of the half-saturation constant that takes different values for 
different background intensities, /,. Thus, the photoreceptor adaptation modeled 
by the Michaelis-Menten equation provides us with the most important mechanism 
of adaptation. 


6.2 HUMAN VISUAL ADAPTATION 201 


log L = 4.5 


Relative response 
oO 
wo 


0.5 F peak 


plateau 


-3 -2 -1 0 1 2 3 4 5 
Log luminance (cd/m?) 


Recovery of response after a long exposure to background intensities [ 19]. 


Response-threshold Relation The observed linear relationship between the vi- 
sual threshold and background intensity (the TVI relationship from Section 6.2) 
can be derived from the cellular adaptation model. (See Figure 6.12 for an intuitive 
derivation.) For this derivation we assume that the threshold A% is the incremen- 
tal intensity required to create an increase in cellular response by a small criterion 
amount ô [45,134]. Based on this assumption, we derive AM from the response 
equation as follows. Rearranging Equation 6.2 yields 


1 

( : ) 

I =op | ——__] .. 
Rmax — R 
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Photoreceptor response adaptation to different background intensities. The plateau 
of Figure 6.10 is shown in gray. It represents the locus of the photoreceptor response to the back- 
ground intensity itself. 


Pictorial illustration of the response-threshold relation. The figure at the top plots 
three photoreceptor response functions at three different background luminances (L1, L2, L3 from 
left to right) about 2 log units apart from each other. The response to the background luminance 
itself is shown by the * symbol on the plots. AR is a small and fixed amount of response above the 
response to the background luminance and ALis are the increments in luminance required to cause 
the same AR response change. The figure at the bottom plots the AL; values as a function the 
background luminances Li. The three values corresponding to background luminances L1, L2, and 
L3 are shown in solid dots. The curve passing through the plotted AL; values has a shape similar 
to the TVI function shown in the earlier part of this chapter. 
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By differentiating this expression with respect to R, we get 


dl 1 R N/R 
= Ob . . —— 
dR n Rmax — R (Rmax — R)* 
1 Rmax len 
== Ob . . m41 nos 
n (Rmax — R) ” 


This gives an expression for the incremental intensity (i.e., d7) required to increase 
the response of the system by dR. If we assume that the criterion response amount 
ô for the threshold condition is small enough, from the previous equation it is 
possible to compute the expression for AJ as 


Al _ dl 
ô dR 
1 R -n 
= Obs , =~ n+l Re 
(Rmax J R)» 


Note that in all these equations R is the response of the cellular system exposed to 
intensity 1, which may be different from the background intensity Jp to which the 
system is adapted. For threshold conditions, we can write R = Rp + ô, where Rj is 
the plateau response of the system at the background intensity Jp. Thus, 


1 R max 
` n+l 


n (Rmax — Rp — ô) ” 


AISSE (Rpte. 


For dark-adapted cells, the response of the system Rp = 0. Thus, the expression of 
the threshold under a dark adaptation condition is 


1 Rmax 3 l-n 


ATaark = Ô + Odark © = ` wre" - 


n (Rmax B ô) n 


6.2 HUMAN VISUAL ADAPTATION 205 


The relative threshold, AJ /AJ gar, for adaptation at any other background intensity 
Ty is 


l-n n+1 


(2#) ( Rmax — ô )’ 
A dark Odark ô Rmax — Rp — ô 
Ob (z)" ( Rmax i 
Odark Ry Rmax — Rp 
8 N (Rog) (+o 
ax i on 


n=l 
1 ( ô o 


Odark Same 


Q 


R max 


For n = 1 and h = op, at is directly proportional to Jj. This relation is in 
agreement with the Weber relation seen in TVI measurements. Thus, Weber’s law 
may be considered as a behavioral manifestation of photoreceptor adaptation. The 
preceding discussion of the various mechanisms of visual adaptation affords the 
following conclusions. 


e Photoreceptor adaptation plays a very important role in visual adaptation. 
An appropriate mathematical model of this adaptation (for example, Equa- 
tion 6.2) can be effectively used to tone map HDR images. The TVI relation 
can be derived from the photoreceptor adaptation model, and hence can be 
used as an alternate mathematical model for tone mapping. 

e The rod and cone combination extends the effective range over which the 
human visual system operates. Depending on the range of intensities present 
in an image, the appropriate photoreceptor system or combination of them 
may be chosen to achieve realistic tone mapping. 
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6.3 VISUAL ADAPTATION MODELS FOR HDR TONE 
MAPPING 


Figure 6.13 outlines a basic framework for HDR tone mapping using models of 
visual adaptation. The two key features of the framework are forward and inverse 
adaptation models. The forward adaptation model will process the scene luminance 
values and extract visual appearance parameters appropriate for realistic tone map- 
ping. The inverse adaptation model will take the visual appearance parameters and 
the adaptation parameters appropriate to the display viewing condition and will out- 
put the display luminance values. Either of the visual adaptation models discussed 
in the previous section (photoreceptor adaptation model or threshold adaptation 
model) may be used for forward and inverse adaptation. Most tone-mapping al- 
gorithms available today make use of one of these models. To achieve the goal of 
realistic HDR compression, these algorithms use photoreceptor responses or JNDs 


Scene Forward 


intensities adaptation 
model 


Display 


Framework for solving tone-mapping problem using visual adaptation. 
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as the correlates of the visual appearance. In this section, we explore various al- 
gorithms and show their relation to the visual adaptation models discussed in the 
previous section. 


6.3.1 PHOTORECEPTOR ADAPTATION MODEL FOR 
TONE MAPPING 


This section brings together a large number of tone-mapping algorithms. The com- 
mon relationship between them is the use of an equation similar to the photorecep- 
tor adaptation equation (Equation 6.2) presented in Section 6.2. In the following 
paragraphs we only show the similarity of the equation used to the photoreceptor 
adaptation equation, and defer the discussion of the details of these algorithms to 
the following two chapters. Here we show the actual form of the equations used in 
the algorithms, and where required rewrite them such as to bring out the similarity 
with Equation 6.2. In their rewritten form they are functionally identical to their 
original forms. It is important to note that although these equations may be derived 
from the same adaptation equations they largely differ in their choice of the value 
of the parameters, and only a few of them specifically claim the algorithm to be 
based on visual adaptation. Thus, all but a few of these algorithms (see [94,95]) 
ignore the inverse adaptation. They use Equation 6.2 because of its several desirable 
properties. These properties are as follows. 


e Independent of input intensity, the relative response is limited to between 0 
and 1. Thus, the relative response output can be directly mapped to display 
pixel values. 

e The response function shifts along the intensity axis in such a way that the 
response of the background intensity is well within the linear portion of the 
response curve. 

e The equation has a near linear response to the intensity in the log domain for 
about 4 log units. The intensity ranges of most natural scenes without any 
highlights or directly visible light sources do not exceed 4 log units. Thus, 
such scenes afford an approximately logarithmic relation between intensity 
and response. 
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Rational Quantization Function Schlick used the following mapping function 
for computing display pixel values from pixel intensity (J) [113]. 


pl 
pI — I + Imax 
I 


= ——_.. [Rewritten form] 
I + Imax—! 
p 


F(I) [Original form] 


Here Jmax is the maximum pixel value, and p takes a value in the range [1, 00]. We 
can directly relate this equation to Equation 6.2 by substituting 1 for n and Imal 


for op in that equation. Note that the value of op depends on the value of J itself, 
which may be interpreted as if the value of every pixel served as the background 
intensity in the computation of the cellular response. 


Gain Control Function Pattanaik et al. introduced a gain control function for 
simulating the response of the human visual system and used this gain-controlled 
response for tone mapping [94]. They proposed two different equations for mod- 
eling the response of rod and cone photoreceptors. The equations are 


Feone(L) SPY, alah 
ci(p + c2) 


ri I 
rip + ri)” r3(h + ra)” 


Froa(Z) = 


where the cs and rs are constants chosen to match certain psychophysical measure- 
ments. In their formulation, the /s represent light intensity of the image pixels of 
successively low-pass filtered versions of the image, and for every level of the image 
the background intensity J is chosen as the intensity of the pixel at the next level. 
These equations have a vague similarity with Equation 6.2 and have been given here 
for completeness. 
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S-shaped Curve Tumblin et al. used an S-shaped curve (sigmoid) as their tone- 
mapping function [129]. The equation of this curve is 


r\" 1 
Cals: 
FU) =| ~~; — |: D [Original form] 
(4) +k 
Ty 
I” J” 
4 b 
I+ kī}  k(I”+kI}) 


| - D, [Rewritten form] 


where k, D, and n are the parameters for adjusting the shape and size of the 
S-shaped curve. According to the authors, this function is inspired by Schlick’s 
quantization function, shown previously. The rewritten equation has two parts. The 
first part is identical to Equation 6.2. The second part of the equation makes it an 
S-shaped function on a log-log plot. 


Photoreceptor Adaptation Model Pattanaik et al. [95] and Reinhard and Dev- 
lin [108] made explicit use of Equation 6.2 for tone mapping. Pattanaik et al. used 
separate equations for rods and cones to account for the intensity in scotopic and 
photopic lighting conditions. The op values for rods and cones were computed from 
the background intensity using 


_ C11b,r0d 
9Ob,rod = 9 94 71/6 
C2] Tbzod + c3(1 =; ) li tod 
ör c4 Íh, cone 
,cone = 
Ke leone +cs(1 — VADI AUN 


b,cone 
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1 
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and Ipod, [b,cone are respectively the background intensities for the rods and cones. 
Pattanaik and Yee extended the use of these functions to tone map HDR images [96]. 
Reinhard and Devlin provided the following much simpler equation for computing 
Op at a given background intensity. 


op = (f h)". 


Here, f and m are constants and are treated as user parameters in this tone-mapping 
algorithm. 


Photographic Tone-mapping Function The photographic tone-mapping func- 
tion used by Reinhard et al. [106,109] is very similar to Equation 6.2. The equation 
can be written in the following form. 


I 
T 
F(I)= aa a [Original form] 
l+a— 
+ T, 
I 
= [Rewritten form] 
h 
I+— 
a 


Here, a is a scaling constant appropriate to the illumination range (key) of the 
image scene. 


6.3.2 THRESHOLD VERSUS INTENSITY MODEL FOR 
TONE MAPPING 


In the previous section we have shown the relationship between the TVI model and 
the photoreceptor adaptation model. Thus, it is obvious that the TVI model can be 
used for tone reproduction. Ward’s [139] tone-mapping algorithm is the first to 
make use of the TVI model. In his algorithm, Ward used a JND (just-noticeable 
difference), the threshold A% at any background M, as a unit to compute the cor- 


relate of the visual appearance parameter. From the scene pixel luminance Jgcene 
I— Tv scene 


and the scene background luminance Jp, scene, Ward computed the ratio Ale : 
„scene 
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This ratio represents the number of JNDs by which the pixel differs from the back- 
ground. Using the display background luminance Jp display, and display adaptation 
threshold AJp scene he inverted the JNDs to compute the display pixel luminance. 
The inversion expression is as follows. 


Taisplay = JNDs x Alb, display + Ty display (6.3) 


Ferwerda et al. [35] later adapted this concept to compute JNDs specific to rods and 
cones for the purpose of tone-mapping images with a wide range of intensities. 
If the background intensity is locally adapted, the log-linear relationship of the 
threshold-to-background intensity provides the necessary range compression for 
HDR images. The issue of local versus global adaptation is discussed in the next 
section. 


6.4 BACKGROUND INTENSITY IN COMPLEX IMAGES 


In the previous sections we introduced two important adaptation models: the pho- 
toreceptor response model and the TVI model. Both of these adaptation models 
require knowledge of the background intensity Jp. For any use of either of these 
models in tone reproduction, J has to be computed from the intensity of the im- 
age pixels. In this section we describe various methods commonly used to estimate 
Ty from an image. 


6.4.1 IMAGE AVERAGE AS h 


The average of the intensity of the image pixels is often used as the value of M. The 
average could be the arithmetic average 


or geometric average 
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where n in the equations is the total number of pixels in the image, and £ (an 
arbitrary small increment), is added to the pixel intensities to take into account the 
possibility of any zero pixel values in the image. The geometric average can also be 
computed as 


exp (G $ log; + ») ; 


i=l 


where the exponent i yy, log + £) is the log average of the image pixels. 

In the absence of any knowledge of the actual scene, one of these image averages 
is probably the most appropriate estimate of Jp for most images. A visual adaptation 
model using such an average is referred to as a global adaptation, and the tone- 
mapping method is referred to as global tone mapping. The geometric average is 
often the preferred method of average computation. This is largely because (1) the 
computed background intensity is less biased toward outliers in the image and (2) 
the relationship between intensity and response is log-linear. 


6.4.2 LOCAL AVERAGE AS h 


In images with a very high dynamic range, the intensity change from region to re- 
gion can be drastic. Hence, the image average (also called global average) is not suf- 
ficiently representative of the background intensity of the entire image. The proper 
approach in such cases would be to segment the image into regions of LDR and 
use the average of pixels in each region. Yee and Pattanaik’s work shows that such 
segmentation in natural images is not always easy, and that tone mapping using the 
local average from regions obtained using existing segmentation techniques may 
introduce artifacts at region boundaries [150]. 

An alternative and popular approach is to compute a local average for every pixel 
p in the image from its neighboring pixels. The various techniques under this cat- 
egory include box filtering and Gaussian filtering. These techniques are easily com- 
puted. The computation may be expressed as 


1 
ho» = x w(p, i)li. (6.4) 
ei SET aaa 
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For Gaussian filtering 


Be: lp — il? 
w(p,i) = exp a ‘ 


For box filtering, it is expressed as 


1 for ||p—il|<s, 


wan = 0 otherwise. 


In these equations, Q represents all pixels of the image around p, ||.|| is the spatial 
distance function, and s is a user-defined size parameter in these functions. Effec- 
tively, the value of s represents the size of a circular neighborhood around the pixel 
p that influences the average value. 

Although for most pixels in the image the local average computed in this fashion 
is representative of the background intensity, the technique breaks down at HDR 
boundaries. This is due to the fact that the relatively large disparity in pixel inten- 
sities in the neighborhood of the boundary biases the average computation. Thus, 
the background intensity computed for pixels on the darker side of the boundary is 
positively biased, and those computed for the pixels on the brighter side are neg- 
atively biased. This biasing gives rise to halo artifacts in the tone-mapped images. 
Figure 6.14 highlights the problem. The image shown is computed using local box- 
filtered values for the background intensity. Note the dark band on the darker side 
of the intensity boundary. Although not noticeable, similar bright banding exists on 
the brighter side of the boundary. 

This problem can be avoided by computing the average from only those pixels 
whose intensities are within a reasonable range of the intensity of the pixel un- 
der consideration. The tone-mapped image in Figure 6.15 shows the result using 
background intensity from such an adaptive computational approach. There is a sig- 
nificant improvement in the image quality, but at an increased cost of computation. 
Two such computational approaches are discussed in the following sections. 


Local Average Using Variable Size Neighborhood In this approach, the size 
parameter s in Equation 6.4 is adaptively varied. Reinhard et al. and Ashikhmin 
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Halo artifacts associated with the use of I}, computed by local averaging. The 
artifacts are most noticeable at the illumination boundaries. 


Tone mapping using adaptive local averaging. 


6.4 BACKGROUND INTENSITY IN COMPLEX IMAGES 215 


simultaneously proposed this very simple algorithm [6,109]. Starting from a value 
of s equal to 1, they iteratively double its value until the pixels from across the HDR 
boundary start to bias the average value. They assume that the average is biased if 
it differs from the average computed with the previous size by a tolerance amount. 
They use this s in Equation 6.4 for computing their local average. 


Local Average Using Bilateral Filtering In this approach, the size parameter s 
remains unchanged, but the pixels around p are used in the average summation 
only if their intensity values are similar to the intensity of p. The similarity can be 
user defined. For example, the intensities may be considered similar if the difference 
or the ratio of the intensities is less than a predefined amount. Such an approach 
may be implemented by filtering both in spatial and intensity domains. The name 
“bilateral” derives from this dual filtering. The filter can be expressed as 


1 


X w(p, ig Up, Ii) 
i€Q 


Y w(p,gUp. Ti, (6.5) 
iEQR 


l, p = 


where w() and g() are the two weighting functions that take into account the 
dual proximity. The forms of these weighting functions can be similar, but their 
parameters are different: for g() the parameters are the intensities of the two pixels, 
and for w() the parameters are the positions of the two pixels. 

Durand and Dorsey use Gaussian functions for both domains [23]. Pattanaik and 
Yee use a circular box function for w(), and an exponential function for g() [96]. 
Choudhury and Tumblin have proposed an extension to this technique to account 
for gradients in the neighborhood. They named their extension “trilateral filter- 
ing” [10]. 

Figure 6.16 shows the linearly scaled version of the original HDR image and 
the images assembled from intensities computed for each pixel using some of the 
adaptive local adaptation techniques discussed in this section. 


6.4.3 MULTISCALE ADAPTATION 


Although the use of local averages as the background intensity is intuitive, the choice 
of the size of the locality is largely ad hoc. In this section we provide some empirical 
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(b) 


(continued on next page) 


Local averages for a sample HDR image (a). Images (b) and (c) were computed 
using Equation 6.4, and images (d) and (e) were computed using Equation 6.4. Equal weighting 
is used for images (b) and (d) and Gaussian weighting for images (c) and (e). g() for image (d) 
is from Pattanaik [96] and for image (e) from Durand and Dorsey [23]. (HDR image courtesy 
Columbia University. ) 
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(Continued. ) 


support for the use of local averages and the associated importance to the size of the 
locality. 

Physiological and psychophysical evidence indicates that the early stages of visual 
processing can be described as the filtering of the retinal image by bandpass mech- 
anisms sensitive to patterns of different scales [146]. These bandpass mechanisms 
adapt independently to the average intensity within a region of a scene defined by 
their spatial scale. In a complex scene, this average will be different at different 
scales and thus the mechanisms will all be in different states of adaptation. Thus, to 
correctly account for the changes in vision that occur with changes in the level of 
illumination we need to consider local adaptation at different spatial scales within 
HDR environments. Peli suggests that an appropriate way of characterizing the ef- 
fects of local adaptation on the perception of scenes is to use low-pass images that 
represent the average local luminance at each location in the image at different 
spatial scales [98]. Reinhard et al. [109] and Ashikmin [6] use this multiscale ap- 
proach to adaptively decide the effective neighborhood size. Pattanaik et al.'s [94] 
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multiscale adaptation also demonstrates the usefulness of the multiscale nature of 
the visual system in HDR tone mapping. 


6.5 DYNAMICS OF VISUAL ADAPTATION 


In earlier sections we discussed the adaptation of the visual system to background 
intensity. However, visual adaptation is not instantaneous. In the course of the day, 
light gradually changes from dim light at dawn to bright light at noon, and back 
to dim light at dusk. This gradual change gives the visual system enough time 
to adapt, and hence the relatively slow nature of visual adaptation is not noticed. 
However, any sudden and drastic change in illumination, from light to dark or dark 
to light, makes the visual system lose its normal functionality momentarily. This 
loss of sensitivity is experienced as total darkness during a light-to-dark transition, 
and as a blinding flash during a dark-to-light transition. Following this momentary 
loss in sensitivity, the visual system gradually adapts to the prevailing illumination 
and recovers its sensitivity. This adaptation is also experienced as a gradual change 
in perceived brightness of the scene. 

The time course of adaptation, the duration over which the visual system grad- 
ually adapts, is not symmetrical. Adaptation from dark to light, known as light adap- 
tation, happens quickly (in a matter of seconds), whereas dark adaptation (adaptation 
from light to dark) occurs slowly (over several minutes). We experience the dark- 
adaptation phenomenon when we enter a dim movie theater for a matinee. Both 
adaptation phenomena are experienced when we drive into and out of a tunnel 
on a sunny day. The capability of capturing the full range of light intensities in 
HDR images and video poses new challenges in terms of realistic tone mapping of 
video-image frames during the time course of adaptation. 

In Section 6.2 we argued that vision is initiated by the photochemical interac- 
tion of photons with the photo-pigments of the receptor. This interaction leads to 
bleaching and hence to loss of photo-pigments from receptors. The rate of photon 
interaction and hence the rate of loss in photo-pigments is dependent on the in- 
tensity of light, on the amount of photo-pigment present, and on photosensitivity. 
A slow chemical regeneration process replenishes lost photo-pigments. The rate of 
regeneration depends on the proportion of bleached photo-pigments and on the 
time constant of the chemical reaction. 
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From the rate of bleaching and the rate of regeneration it is possible to compute 
the equilibrium photo-pigment concentration for a given illumination level. Be- 
cause the rate of photon interaction is dependent on the amount of photo-pigments 
present, and because the bleaching and regeneration of bleached photo-pigments 
are not instantaneous, visual adaptation and its time course were initially thought 
to be directly mediated by the concentration of unbleached photo-pigments present 
in the receptor. 

Direct cellular measurements on isolated and whole rat retinas by Dowling ([19] 
Chapter 7) show that dark adaptation in both rods and cones begins with a rapid 
decrease in threshold followed by a slower decrease toward the dark-adaptation 
threshold. The latter slow adaptation is directly predicted by photo-pigment con- 
centrations, whereas the rapid adaptation is attributed almost entirely to a fast neural 
adaptation process that is not well understood. 

The Michaelis-Menten equation (Equation 6.1) models the photoreceptor re- 
sponse and accounts for visual adaptation by changing the op value as a function of 
background intensity. Photoreceptor adaptation and pigment bleaching have been 
proposed to account for this change in oy value. Valeton and van Norren have mod- 
eled the contribution of these two mechanisms to the increase in op with 


Ob = Odark9b,neural9b, bleach» (6.6) 


where Ogark is the semisaturation constant for dark conditions, Op neural accounts for 
the loss in sensitivity due to neural adaptation, and op, bleach accounts for the loss 
in sensitivity due to loss of photo-pigment [133]. The value of 04 bleach is inversely 
proportional to the fraction of unbleached photo-pigments at the background light. 

Pattanaik et al. extended the use of the adaptation model to compute the time 
course of adaptation for simulating visual effects associated with a sudden change 
in intensities from dark to light, or vice versa [95]. They use a combination of 
Equations 6.1 and 6.6 to carry out the simulation, as follows. 


R q” 
Rmax 7 I” + of (t) 


Op(t) = OdarkOb neural (4) Ob, bleach (t) 
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Here, time-dependent changes of 0} neural and Ob, bleach are Modeled with exponen- 
tial decay functions. 


6.6 SUMMARY 


This chapter proposed the view that the modeling of human visual adaptation is key 
to realistic tone mapping of HDR images. We saw that photoreceptor adaptation is 
the most important factor responsible for visual adaptation, with Equation 6.1 be- 
ing the mathematical model for this adaptation. The relation between various tone- 
mapping algorithms and the photoreceptor adaptation model was made evident. 
Background intensity is a key component in this model. Some of the commonly 
used methods for computing this background intensity in images were discussed. 
We also saw the usefulness of a human visual model in realistic simulation of vi- 
sual effects associated with the wide range of real-life illuminations. Whereas this 
chapter explored the similarities between several current tone-reproduction oper- 
ators, the following two chapters discuss their differences and present each tone- 
reproduction operator in detail. 


Spatial Tone Reproduction 


In this and the following chapter we dis- 
cuss specific algorithms that prepare HDR 
images for display on LDR display de- 
vices. These algorithms are called tone- 
reproduction or tone-mapping operators 
(we do not distinguish between these two 
terms). For each operator we describe 
how dynamic range reduction is achieved, 
which user parameters need to be speci- 
fied, and how these user parameters affect the displayed material. These chapters 
are intended as a reference for those who want to understand specific operators 
with a view toward implementing them. 

Tone-reproduction operators may be classified in several ways. The classification 
followed here is to distinguish operators loosely based on how light reflects from 
a diffuse surface (as discussed in the following chapter) from operators working 
directly on pixels (i-e., operating in the spatial domain). 

A common classification of spatial tone-reproduction operators distinguishes be- 
tween “local” and “global” operators, as discussed in Chapter 6. In summary, a local 
operator would compute a local adaptation level for each pixel based on the pixel 
value itself, as well as a neighborhood of pixels surrounding the pixel of interest. 
This local adaptation level then drives the compression curve for this pixel. Because 
the neighborhood of a pixel helps determine how this pixel is compressed, a bright 
pixel in a dark neighborhood will be treated differently than a bright pixel in a 
bright neighborhood. A similar argument can be made for dark pixels with bright 
and dark neighborhoods. 

If an operator uses the entire image as the neighborhood for each pixel, such 
operators are called global. Within an image, each pixel is compressed through a 
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compression curve that is the same for all pixels. As a result, global operators are 
frequently less expensive to compute than local operators. 

Alternatively, tone reproduction may be achieved by transforming the image into 
a different representation, such as with use of the Fourier domain or by differen- 
tiation. These operators form different classes, and are discussed in the following 
chapter. Thus, four different approaches to dynamic range reduction are distin- 
guished in this book, and each tone-reproduction operator may be classified as one 
of the following four broad categories. 


e — Global operators: Compress images using an identical (nonlinear) curve for each 
pixel. 

e Local operators: Achieve dynamic range reduction by modulating a nonlinear 
curve by an adaptation level derived for each pixel independently by consid- 
ering a local neighborhood around each pixel. 

e Frequency domain operators: Reduce the dynamic range of image components se- 
lectively, based on their spatial frequency (Chapter 8). 

e Gradient domain operators: Modify the derivative of an image to achieve dynamic 
range reduction (Chapter 8). 


Factors common to most tone-reproduction operators are discussed first, including 
treatment of color, homomorphic filtering, and Fourier domain decompositions. 
Then, global operators are cataloged in Section 7.2, followed by local operators in 
Section 7.3. 


7.1 PRELIMINARIES 


Because all tone-reproduction operators are aimed at more or less the same problem 
(namely, the appropriate reduction of dynamic range for the purpose of display), 
there are several ideas and concepts that are shared by many of them. In particular, 
the input data are often expected to be calibrated in real-world values. In addition, 
color is treated similarly by many operators. At the same time, several operators 
apply compression in logarithmic space, whereas others compress in linear space. 
Finally, most local operators make use of suitably blurred versions of the input im- 
age. Each of these issues is discussed in the following sections. 
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7.1.1 CALIBRATION 


Several tone-reproduction operators are inspired by aspects of human vision. The 
human visual response to light at different levels is nonlinear, and photopic and 
scotopic lighting conditions in particular lead to very different visual sensations (as 
discussed in the preceding chapter). For those tone-reproduction operators, it is 
important that the values to be tone mapped are specified in real-world units (i.e., 
in cd/m’). This allows operators to differentiate between a bright daylit scene and 
a dim night scene. This is not generally possible if the image is given in arbitrary 
units (see, for example, Figure 7.1). 

However, unless image acquisition is carefully calibrated images in practice may 
be given in arbitrary units. For several tone-reproduction operators, this implies 
(for instance) that an uncalibrated night image may be tone mapped as if it were 
a representation of a daylit scene. Displaying such an image would give a wrong 
impression. 

Images may be calibrated by applying a suitably chosen scale factor. Without 
any further information, the value of such a scale factor can realistically only 
be approximated, either by trial and error or by making further assumptions on 
the nature of the scene. In this chapter and the next we show a progression 
of images for each operator requiring calibrated data. These images are gener- 
ated with different scale factors such that the operator’s behavior on uncalibrated 
data becomes clear. This should facilitate the choice of scale factors for other im- 
ages. 

Alternatively, it is possible to use heuristics to infer the lighting conditions for 
scenes depicted by uncalibrated images. In particular, the histogram of an image 
may reveal if an image is overall light or dark, irrespective of the actual values in the 
image. Figure 7.2 shows histograms of dark, medium, and light scenes. For many 
natural scenes, a dark image will have pixels with values located predominantly 
toward the left of the histogram. A light image will often display a peak toward 
the right of the histogram, with images between having a peak somewhere in the 
middle of the histogram. 

An important observation is that the shape of the histogram is determined both 
by the scene being captured and the capture technique employed. In that our main 
tool for capturing HDR images uses a limited set of differently exposed LDR im- 
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This image is given in arbitrary units, and is tone mapped three times with different 


parameters using the photoreceptor-based technique discussed in Section 7.2.7. Without knowing 
the actual scene, it is difficult to assess which of these three renditions is most faithful to the actual 
scene. If the data were properly calibrated, the absolute values in the image would allow the overall 
brightness to be determined. 


ages (discussed in Chapter 4), images with the sun directly visible will still contain 
burned-out regions. Similarly, low-level details in nighttime scenes may also not be 
represented well in an HDR image captured with this method. These limitations af- 
fect the shape of the histogram, and therefore the estimation of the key of the scene. 
A number that correlates to the peak found in a histogram, but is not equal to the 
location of the peak, is the log average luminance found in the image, calculated as 
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5000 Parking garage - low key 1800 Grove D - mid key 
10 j 12 
FIGURE 7.2 Histograms for scenes that are overall dark (left), medium (middle), and light 
(right). 
1 
Liy = exp N 5 log (Lwa, y)) 3 (7.1) 
x,y 


where the summation is only over non-zero pixels. 
The key of a scene, a unitless number that relates to the overall light level, may 
be inferred from a histogram. It is thus possible to empirically relate the log average 
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luminance to the minimum and maximum luminance in the histogram (all three 
are shown in the histograms of Figure 7.2). The key œ may be estimated using the 
following [106]. 
= 2 logy Lay — logy Lmin — logy Lmax 
logy Lmax — logs Lmin 
a=0.18 x 4f (7.3) 


(7.2) 


Here, the exponent f computes the distance of the log average luminance to the 
minimum luminance in the image relative to the difference between the mini- 
mum and maximum luminance in the image. To make this heuristic less dependent 
on outliers, the computation of the minimum and maximum luminance should 
exclude about 1% of the lightest and darkest pixels. For the photographic tone- 
reproduction operator (discussed in Section 7.3.6), a sensible approach is to first 
scale the input data such that the log average luminance is mapped to the estimated 
key of the scene, as follows. 


a 
Ly (x, y) = ——Lw(x, y) (7.4) 
av 
Although unproven, this heuristic may also be applicable to other tone-reproduction 
techniques that require calibrated data. However, in any case the best approach 
would be to always use calibrated images. 


7.1.2 COLOR IMAGES 


The human visual system is a complex mechanism with several idiosyncrasies 
that need to be accounted for when preparing an image for display. Most tone- 
reproduction operators attempt to reduce an image in dynamic range while keeping 
the human visual system’s response to the reduced set of intensities constant. This 
has led to various approaches that aim at preserving brightness, contrast, appear- 
ance, and visibility. 

However, it is common practice among many tone-reproduction operators to 
exclude a comprehensive treatment of color. With few exceptions, it is generally ac- 
cepted that dynamic range compression should be executed on a single-luminance 
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channel. Although this is the current state of affairs, this may change in the near 
future, as the fields of color-appearance modeling and tone reproduction are grow- 
ing closer together. This is seen in Pattanaik’s multiscale observer model [94] and 
in more recent developments, such as Johnson and Fairchild’s iCAM model [29,30] 
and Reinhard and Devlin’s photoreceptor-based operator [108]. 

Most other operators derive a luminance channel from the input RGB values (as 
discussed in Section 2.4) and then compress the luminance channel. The luminance 
values computed from the input image are called world luminance (Ly). The tone- 
reproduction operator of choice will take these luminance values and produce a 
new set of luminance values La. The subscript d indicates “display” luminance. 
After compression, the luminance channel needs to be recombined with the un- 
compressed color values to form the final tone-mapped color image. 

To recombine luminance values into a color image, color shifts will be kept to 
a minimum if the ratio between the color channels before and after compression 
are kept constant [47,113,119]. This may be achieved if the compressed image 
RaGaBgq is computed as follows. 


Should there be a need to exert control over the amount of saturation in the image, 
the fraction in the previous equations may be fitted with an exponent s, resulting 
in a per-channel gamma correction as follows. 


The exponent s is then given as a user parameter that takes values between 0 and 1. 
For a value of 1, this method defaults to the standard method of keeping color ratios 
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The saturation parameter s$ is set to 0.6, 0.8, and 1.0 (in reading order). 


constant. For smaller values, the image will appear more desaturated. Figure 7.3 
demonstrates the effect of varying the saturation control parameter s. Full saturation 
is achieved for a value of s = 1. Progressively more desaturated images may be 
obtained by reducing this value. 

An alternative and equivalent way of keeping the ratios between color channels 
constant is to convert the image to a color space that has a luminance channel and 
two chromatic channels, such as the Yxy color space. If the image is converted 
to Yxy space first, the tone-reproduction operator will compress the luminance 
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channel Y and the result will be converted to RGB values for display. This approach 
is functionally equivalent to preserving color ratios. 


7.1.3 HOMOMORPHIC FILTERING 


The “output” of a conventional 35-mm camera is a roll of film that needs to be 
developed, which may then be printed. The following examines the representation 
of an image as a negative toward defining terminology used in the remainder of 
this and the following chapter. Under certain conditions, it may be assumed that 
the image recorded by a negative is formed by the product of illuminance Ey and 
the surface reflectance r, as follows. 


L = Egr 


This is a much simplified version of the rendering equation, which ignores specular 
reflection, directly visible light sources, and caustics and implicitly assumes that 
surfaces are diffuse. This simplification does not hold in general, but is useful for 
developing the idea of homomorphic filtering. 

If luminance is given, it may be impossible to retrieve either of its constituent 
components, Ey or r. However, for certain applications (including tone reproduc- 
tion) it may be desirable to separate surface reflectance from the signal. Although 
this is generally an underconstrained problem, it is possible to transform the previ- 
ous equation to the log domain, where the multiplication of Ey and r becomes an 
addition. Then, under specific conditions the two components may be separated. 
Horn’s lightness computation, discussed in the following chapter, relies on this ob- 
servation. 

In general, processing applied in the logarithmic domain is called homomorphic 
filtering. We call an image represented in the logarithmic domain a density image for 
the following reason. A developed photographic negative may be viewed by shining 
a light through it and observing the transmitted pattern of light, which depends on 
the volume concentrations of amorphous silver suspended in a gelatinous emulsion. 
The image is thus stored as volume concentrations C(z), where z denotes depth, 
given that a transparency has a certain thickness. Transmission of light through 
media is governed by Beer’s law, and therefore the attenuation of luminance as a 
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function of depth may be expressed in terms of the previous volume concentrations 
as 
dLy 
dz 
with k the attenuation constant. In the following, the luminance at a point on the 
surface is denoted with L,(0). This equation may be solved by integration, yielding 


the following solution. 
Lv di zr 
i ~= i, C(z) dz 
Ly(0) # 0 


(cis) 
In = —kd 
L,(0) 


Ly = Ly(0) exp(—kd) 


= —kC(z)Ly, 


Thus, if we integrate along a path from a point on the surface of the transparency 
to the corresponding point on the other side of the transparency we obtain a lu- 
minance value Ly attenuated by a factor derived from the volume concentrations 
of silver suspended in the transparency along this path. For a photographic trans- 
parency, the image is represented by the quantities d, which have a different value 
for each point of the transparency. The values of d have a logarithmic relationship 
with luminance Ly. This relationship is well known in photography, although it is 
usually represented in terms of density D, as follows. 


Ly(O 
D = to810( < ) 


vV 


The density D is proportional to d and is related to the common logarithm of Ly 
in a manner similar to the definition of the decibel [123]. Because all such repre- 
sentations are similar (barring the choice of two constant parameters), logarithmic 
representations are also called density representations. The general transformation 
between luminance and a density representation may be written as follows. 


D =In(Ly) 
Ly = exp(D) 
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Although a luminance representation of an image necessarily contains values that 
are nonnegative, in a density representation the range of values is not bound, as in 
the following. 


—oo < D=1n(E£,) + In(r) < œ 
—00 < In(E,) < © 


Tmin < In(r) <0 


In addition, reflectance and illuminance are now added rather than multiplied, 
which is a direct result of operating in the log domain. Filtering operations such as 
tone reproduction may be carried out in this domain, which is then called homo- 
morphic filtering. The advantage of this representation is that under circumstances 
in which light behavior may be modeled as a product of illuminance and reflectance 
homomorphic filtering allows this product to be represented as an addition, which 
makes separation of the two components simpler. 


7.1.4 GAUSSIAN BLUR 


Several operators require the computation of local averages for each pixel. A local 
average may be viewed as a weighted average of the pixel and some of its neighbors. 
In most cases the weights are chosen according to a Gaussian distribution. Images 
filtered by a Gaussian filter kernel may be computed directly in the image domain, 
where the computation is a convolution, as follows. 


a lee) lee) 1 x+y 
Lh Gf J Lœ y) exp( 52 ) axay 
—CO J—CO 


For discrete images, the integrals are replaced by summations. In this chapter we 
will use the shorthand notation 


to indicate that image L is convolved with filter kernel R. 
The Gaussian filter kernel is sampled at discrete points, normally at positions 
corresponding to the midpoints of each pixel. For very small filter kernels, point 
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sampling a Gaussian function with very few samples leads to a large error. To ac- 
count for the spacing between sample points, a fast way of integrating a Gaussian 
function over an area may be achieved by expressing the Gaussian in terms of the 
error function, which is given by* 


erf(x) = =f exp(—x’) dx. 


We may therefore build an image R(x, y) the size of the input image, which repre- 
sents the Gaussian filter (with the peak at pixel (0, 0)), as follows. 


R= Ge -23) en(* +23) 
x (=(=) et(2+2)) 
o o 


The computational cost of the error function is not higher than evaluating the ex- 


ponential function. In this scheme, four error functions are executed per pixel, and 
therefore the accuracy obtained by integration over each pixel’s area comes at a 
slight computational cost. This extra expense is acceptable because for certain ap- 
plications (such as the photographic tone-reproduction operator discussed in Sec- 
tion 7.3.6) the extra accuracy is nonnegligible. For all results shown in this and the 
following chapter (involving Gaussian blurred images) we have used this scheme. 

The cost of blurring an image lies in the convolution operator. Because for every 
pixel every other pixel needs to be considered, direct convolution takes O(N?) 
time in the number of pixels. For convolution kernels larger than 3 by 3 pixels 
(for example), this is too costly in practice. In such cases we may transform both 
filter kernel and image to the Fourier domain by means of a fast Fourier transform 
(FFT). The convolution then becomes a pointwise multiplication that takes O(N) 
time. The FFT and inverse FFT each take O(N log(V)) time, and thus the time 
complexity of blurring an image with a Gaussian filter kernel takes O(N log(N)) 
time in total. 


1 This idea was developed by Mike Stark and became part of the photographic tone-reproduction operator [109], whereby 
the robustness of the scale-selection mechanism improved as a result (see Section 7.3.6 for further details on scale 
selection). 
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Before the FFT of the filter kernel can be computed, the Gaussian needs to be 
mirrored in the center of the kernel image. The center of the Gaussian is then repli- 
cated in each of the four corners of the image. The process of blurring an image 
is shown in Figure 7.4. Note that the FFT of the Gaussian filter kernel is again a 
Gaussian function, albeit now as a function of frequency. It would therefore be pos- 
sible to construct the Gaussian filter kernel directly in the Fourier domain, thereby 
saving one FFT transform. 

Alternatively, it may be possible to truncate the Gaussian filter kernel to reduce 
the computational cost, or resort to fast approaches that are not based on Fourier 
decomposition. Examples are the elliptically weighted average approach [46] and 
Burt and Adelson’s approximation [8]. The 2D Gaussian filter is separable (i.e., it 
may be expressed as the multiplication of two 1D Gaussian filters), as follows. 


R(x, y) = Rex) RO) 


1 xX +y 1 x? 1 y? 
ex = ex ex 
Ina? P 20? Vro PL 392 J 210 PL" 262 


This means that the Gaussian convolution may be computed in x and y directions 


separately, providing a further opportunity to reduce the computational cost in that 
two 1D FFTs are quicker to execute than one 2D transform. 


7.1.5 VALIDATION 


To allow more informed decisions as to which operator is suitable for which 
task, there is a need for validation studies. At the time of writing, only two such 
studies exist [20,31,68], with more beginning to emerge [73]. In addition, the 
CIE has formed a technical committee (TC8-08) to study the issue of how tone- 
reproduction operators might be validated [60].? 

Currently, visual comparison remains one of the most practical ways of assessing 
tone-reproduction operators, but this approach is not without pitfalls. In particular, 
the choice of parameter settings for each of the operators will have a large impact on 
the outcome of such visual comparisons. To avoid subconscious comparisons, we 


2 See also the web site for CIE Division 8: Image Technology at www.colour.org. 
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Top row: input image and Gaussian filter kernel. Middle row: Fourier representation 
of the image and the filter kernel. After pointwise multiplication of these images followed by the 
inverse Fourier transform, the Gaussian blurred image shown at the bottom is obtained. 


have purposely chosen to use different images for each of the operators discussed in 
this and the following chapter. Until proper validation studies start to show a pattern 
and reach agreement on which operators perform well, visual comparison, taste, 
and other secondary considerations will dominate the decision-making process. 


7.2 GLOBAL OPERATORS 


The simplest functions that reduce an image’s dynamic range treat each pixel inde- 
pendently. Such functions usually take for each pixel its value and a globally derived 
quantity, usually an average of some type (see Section 6.4). 

Global operators share one distinct advantage: they are computationally efficient. 
Many of them may be executed in real time. Because global operators are generally 
much faster than any of the other classes of operators, applications that require this 
level of performance should consider global operators over all others. 

On the other hand, if the dynamic range of an image is extremely high the 
global tone-reproduction operators may not always preserve visibility as well as 
other operators. However, there are also differences among global operators. Some 
operators are able to handle a larger class of HDR images than others. This issue is 
discussed further in the following sections. 


7.2.1 MILLER BRIGHTNESS-RATIO-PRESERVING 
OPERATOR 


The first global tone-reproduction operator we know of was documented in 1984 
by Miller and colleagues [186]. They aimed to introduce the field of computer 
graphics to the lighting engineering community. For rendering algorithms to be 
useful in lighting design, they should output radiometric or photometric quantities, 


238 CHAPTER O7. SPATIAL TONE REPRODUCTION 


rather than arbitrarily scaled pixel intensities. Physically based rendering algorithms 
therefore produce imagery that is typically not directly displayable on LDR display 
devices, thus requiring tone reproduction for display (see Chapter 4). 

As a result, Miller et al. developed a tone-reproduction operator that aims at pre- 
serving the sensation of brightness of the image before and after dynamic range 
reduction. Brightness is a complex function of both luminance and spatial configu- 
ration that may be simplified for the purposes of this work. Here, brightness Q is 
approximated as a power function of luminance Ly, as follows. 


O=kL 


Miller et al. assert that the visual equivalence of an image before and after dynamic 
range reduction may be modeled by keeping brightness ratios constant. Thus, for 
two elements Q; and Q2 to be visually equivalent to their compressed counterparts 
Q' and Q3, their ratios should be constant. That is, 


Qı _ Qi 


Q2 Q, 
It should be noted that visual equivalence between pairs of brightness values is not 
the same as being equal (i.e., in general, Q; will be different from Q), and Q2 will 
not be equal to Q4). 

The procedure for preparing an image for display starts by converting an image 
to brightness values Q(x, y). Then the maximum brightness of the image Qw,max 
is determined. The image’s brightness values are then normalized by dividing each 
pixel’s brightness representation by the image’s maximum brightness. 

The display device’s maximum brightness Qa max is then determined from its 
maximum luminance value using the same luminance brightness relationship. Dis- 
play brightnesses Qa(x, y) are determined using the following. 


Qalx, y) = Only) Oa,max 


Ow,max 


These brightnesses are then converted to luminances by applying the inverse of 
the brightness function. There exist different formulas for determining brightness 
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values from luminances. Miller et al. experimented with three different formula- 
tions, and determined that the one proposed by Stevens [121] produced the most 
plausible results. Fitting functions to Stevens’ psychophysical data, Miller created 
functional forms for k and b, as follows. 


b = 0.338 L004 
k= -1.5 logio(Ly) +6.1 


The relationship between luminance and brightness then becomes 


0.034 
Q = (1.5 logo (Ly) +6.) L, 


A plot of this function is shown in Figure 7.5. The function monotonically increases 
until about 2,000 cd/m?, and then steeply declines. This is a result of fitting the 
previous function to psychophysical data that are only valid for a limited range (up 
to about 1,000 cd/m’). 

As Miller et al’s work is aimed at lighting design, their operator is suitable for 
compressing luminance ranges that are typically found in indoor situations. They 
assert that actual room luminances range between 100 and 1,000 cd/ m?, whereas 
display devices are typically limited to the range of 1 to 33 cd/m’. Current display 
devices can be brighter, though. Most tone-reproduction operators requiring an 
estimate of the maximum display luminance use values in the range of 30 to 100 
cd/m’. 

A second implication of the sharp decline of the previous function is that this 
brightness equation is not analytically invertible, which is necessary for Miller’s 
operator to be useful. However, the inverse of this function may be approximated 
with a lookup table of sufficiently high resolution, allowing us to experiment with 
this tone-reproduction operator. 

For practical purposes, we normalize each image within the range between 0 
and 1,000 cd/m’. This places an assumption on the input image, which is that it 
depicts an indoor scene. This is not unreasonable, in that the operator is not suitable 
for images with a higher dynamic range. 

The maximum display luminance depends on the display device, and therefore 
the maximum brightness of the display device for which the operator should com- 
press will also vary. To simulate tone reproduction for different display devices, we 
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35 1 r 


5 1 1 1 


0 1 2 3 4 
log(L) 


Miller’s mapping of log luminance to brightness. This mapping is valid over its 
monotonically increasing domain (i.e., up to a luminance of about 1,000 cd/m”). 


make Qq max, the maximum display brightness, a user parameter. Varying this pa- 
rameter for an image with a dynamic range comparable to an indoor scene, the set 
of images in Figure 7.6 was produced. This parameter behaves as expected: higher 
maximum monitor brightness values result in lighter images. 

Other than the invertibility of the brightness function, which is solved by em- 
ploying a lookup table, this operator is both simple to implement and fast. However, 
the previously cited limitations make this operator mainly of interest for historical 
reasons. 


7.2 GLOBAL OPERATORS 


Using Miller’s operator, the maximum monitor brightness Qmax was set from 


10 (top left) and incremented by 10 in subsequent images. 
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7.2.2 TUMBLIN-RUSHMEIER 
BRIGHTNESS-PRESERVING OPERATOR 


Where Miller et al. were the first to introduce computer graphics to the field of light- 
ing design, focusing on tone reproduction to accomplish this goal, it was Tumblin 
and Rushmeier who introduced the problem of tone reproduction to the field of 
computer graphics in 1993 [131]. Tumblin and Rushmeier also based their work 
on Stevens’ psychophysical data, realizing that the human visual system is already 
solving the dynamic range reduction problem. 

The Tumblin—Rushmeier operator exists in two different forms: the original op- 
erator [131] and a revised version [132] (which corrects a couple of shortcomings 
including the fact that it was calibrated in sieverts, a unit that is not in wide use). 
For this reason, we limit our discussion to the revised Tumblin—Rushmeier operator 
and will refer to it simply as the Tumblin—Rushmeier operator. 

Although the Tumblin—Rushmeier operator is based on the same psychophysical 
data as Miller’s operator, the brightness function is stated slightly differently, as 
follows. 


He) 
La 


Here, Q is brightness (or perceived luminance), measured in brils. L is luminance 
in cd/m? and L, is the adaptation luminance, also measured in cd/m?. The con- 


Q(x, y) = c( 


stant Co = 0.3698 is introduced to allow the formula to be stated in SI units. Finally, 
y is a measure of contrast sensitivity and is itself a function of the adaptation lumi- 
nance La: 

This function may be evaluated for an HDR image as well as for the intended 
display device. This leads to two sets of brightness values as a function of input 
luminances (or world luminances) and display luminances. In the following, the 
subscripts w and d indicate world quantities (measured or derived from the HDR 
image) and display quantities. Whereas Miller et al. conjecture that image and dis- 
play brightness ratios should be matched, Tumblin and Rushmeier simply equate 
the image and display brightness values, as follows. 


Lw(x, y) yr 


Ow(x, y) = co( Ea 


Fue GLOBAL OPERATORS 243 


L y (Laa) 
Qalx, y) = o(=&2) 
da 


Ow(x, y) = Qa, y) 


The gamma function y (L) models Stevens’ human contrast sensitivity for the image 
and the display by plugging in Lwa and Laa, respectively, given by 


2.655 for L > 100 cd/m? 


y(L)= | <5 F 
1.855 + 0.4log;)(L + 2.3: 10°) otherwise. 


These equations may be solved for La(x, y), the display luminance that is the quan- 
tity we wish to display. The result is 


Y (Lwa)/¥ (Laa) 
La, y) = La( 2”) l 
Lwa 

The adaptation luminances are La, for the display and Lwa for the image. The 
display adaptation luminance is typically between 30 and 100 cd/m’, although this 
number will be higher when HDR display devices are used. The image adaptation 
luminance is given as the log average luminance Lwa (Equation 7.1). The mid- 
range scene luminances now map to mid-range display luminances close to Lda, 
which for dim scenes results in a uniform gray appearance in the display. This may 
be remedied by introducing a scale factor m(Lwa), which depends on the world 
adaptation level Laa, as follows. 


nim ye 


_ Yw 
~ 1.855 +0.4log(Laa) 


Ywd 


Here, Cmax is the maximum displayable contrast, which is typically between 30 and 
100 for an LDR display device. The full operator is then given by 
Y (Lwa) 
Lw(x, y) 
La(x, y) = mLa (E> Y (Laa) : 


wa 
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Lp ii 


Mapping of world luminances to display luminances for the Tumblin—Rushmeier 
operator. 


For suitably chosen input parameters, a plot of this function is given in Figure 7.7. 
As this operator is calibrated in SI units, the image to be tone mapped also needs 
to be specified in SI units. For an image in unknown units, we experimented with 
different scale factors prior to tone reproduction, the results of which are shown in 
Figure 7.8. This image was scaled by factors of 0.1, 1, 10, 100, and 1,000, with 
the scaling resulting in progressively lighter images. For this particular image, a scale 
factor of close to 1,000 would be optimal. Our common practice of normalizing the 
image, applying gamma correction, and then multiplying by 255 was abandoned 
for this image sequence, because this operator already includes a display gamma 
correction step. 
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In this sequence of images, the input luminances were scaled by factors of 0.1, 
1, 10, 100, and 1,000 prior to applying Tumblin and Rushmeier’s revised tone-reproduction 
operator. 


246 CHAPTER 07. SPATIAL TONE REPRODUCTION 


In summary, the revised Tumblin—Rushmeier tone-reproduction operator is 
based on the same psychophysical data as Miller’s operator, but the crucial difference 
is that Miller et al. aim to preserve brightness ratios before and after compression 
whereas Tumblin—Rushmeier attempt to preserve the brightness values themselves. 
In our opinion, the latter leads to a useful operator that produces plausible results, 
provided the input image is specified in cd/m’. If the image is not specified in 
cd/m’, it should be converted to SI units. In that case, the image may be pre-scaled 
by a factor that may be determined by trial and error, as shown in Figure 7.8. 


7.2.3 WARD CONTRAST-BASED SCALE FACTOR 


Whereas both Miller’s and Tumblin and Rushmeier’s operators aim at preserving the 
sensation of brightness, other operators focus less on brightness perception and at- 
tempt to preserve contrasts instead. An early example is Ward’s contrast-based scale 
factor [139]. The model matches JNDs one might discern in the image with JNDs an 
observer of an LDR display device may distinguish. Thus, differences are preserved 
without spending the limited number of display steps on differences undetectable 
by the human visual system. The operator maps image or world luminances to dis- 
play luminances linearly, as follows. 


Lax, yy=mLy(x, y) 


The scale factor m is chosen to match threshold visibility in image and display. This 
requires a threshold-versus-intensity function (TVI) t (La), which maps a threshold 
luminance that is just visible for adaptation luminance L,. We also need to estimate 
the adaptation level for an observer of the image (Lwa), as well as for an observer 
viewing the display (Laa). The scale factor m may then be chosen such that 


t(Laa) = mt (Lwa)- 


Solving for m yields 


m= t(Laa)/t (Lwa). 
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Based on Blackwell’s studies [11], this yields the following scale factor. 


La 2.5 
l ho. 
n= 2 
Tages 1.219 + L84 


In this equation, the display adaptation level is estimated to be half the maximum 
display luminance, which is specified as La,max. The maximum display luminance 
should be specified by the user, and is typically in the range of 30 to 100 cd/m?. The 
world adaptation level may be estimated as the log average of the image’s luminance 
values, as follows. 


Lua = exo( 5 5 log(1078 + z) 


x,y 


In this equation, we sum the log luminance values of all pixels and add a small offset 
to avoid the singularity that occurs for black pixels. This log average computation is 
slightly different from the one used in the preceding section. The small offset could 
be omitted, but then the summation should only include non-zero pixels. Because 
the offset is small, the difference between the two log average computations should 
also be small. The division is by N, the number of pixels in the image. 

As with tone-reproduction operators discussed earlier, the input image needs to 
be specified in SI units. In Figure 7.9 we show the effect of pre-scaling an uncal- 
ibrated image with various values. Scaling the image to larger values produces a 
brighter result, which should not be surprising. 

As this operator scales the input linearly, choices of pre-scaling and values of 
maximum display luminance amount to choosing which luminances in the image 
are mapped to middle gray on the display device. As such, the images shown in 
Figure 7.9 are effectively brighter or darker versions of each other. 


7.2.4 FERWERDA MODEL OF VISUAL ADAPTATION 


The concept of matching JNDs as explored by Ward was also used by Ferwerda et 
al. in their operator. They based their operator on different psychophysical data, 
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Scaling of an image given in uncalibrated units prior to application of Ward's 
contrast-based scale factor. From left to right we scaled the image by factors of 0.01, 0.1, and 1. 


with a somewhat different functional shape as a result. The operator, however, is 
still intrinsically a linear mapping between world and display luminances. Whereas 
Ward’s contrast-based scale factor incorporates only photopic lighting conditions, 
Ferwerda et al. added a scotopic component. They also modeled the loss of visual 
acuity under scotopic lighting, as well as the process of light and dark adaptation, 
which takes place over time. 
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In Ferwerda’s operator, display intensities are computed from world intensities 
by multiplying the latter with a scale factor m and adding an offset b, which al- 
lows contrast and overall brightness to be controlled separately. This is calculated as 
follows. 


Ra(x, y) = d(mRw(x, y) + bLw (x, y)) 
Galx, y) =d(mGw(x, y) + bLw(x, y)) 
Ba(x, y) = d(mBy(x, y) + bLw(X, y)) 
n= tp (Laa) 
tp (Lwa) 
ts (Laa) 
b= 
ts (Laa) 
Lmax 
d= 
Lä maz 


This operator thus scales each of the three red, green, and blue channels by a fac- 
tor m, but adds an achromatic term that depends on the pixel’s luminance. The 
scale factor m governs photopic conditions, whereas the b term handles scotopic 
conditions. Both depend on TVI functions. 

Modeling cones, which are active under photopic lighting conditions, the TVI 
function fp(La) is approximated by the following. 


—0.72 if logio (La) < —2.6 
log 9 fp(La) = { logio(La) — 1.255 if logyg(La) > 1.9 
(0.249 log;9(La) + 0.65)*"? — 0.72 otherwise 


For the rods, active under scotopic lighting conditions, the TVI function f,(La) is 
approximated by the following. 


—2.86 if logyo(La) < —3.94 
logio ts(La) = 4 logyg(La) — 0.395 if log;g(La) > —1.44 
(0.405 log (La) + 1.6)7!8 — 2.86 otherwise 
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For both the photopic and scotopic range, a separate scale factor (mp and ms, re- 
spectively) may be computed using the previous TVI curves, as follows. 


tp(Laa) 
mp = ——~ 

tp (Lwa) 

ts (Laa) 
Ms = 

ts(Lwa) 


These scale factors depend on the display adaptation luminance Laa and the world 
(or image adaptation) luminance Lwa. The display adaptation luminance may be 
estimated to be half the maximum display luminance. For typical LDR displays, the 
maximum display luminance is about 100 cd/m?, and thus the display adaptation 
luminance is estimated as 50 cd/m?. For this operator, the world adaptation lumi- 
nance is approximated by half the maximum world luminance Lmax- 

In addition to mapping luminances to a displayable range, tone reproduction 
may attempt to preserve other aspects of human vision across viewing conditions. 
One of these is visual acuity. Under scotopic lighting conditions, the human visual 
system may not resolve as much detail as under photopic lighting conditions. Fer- 
werda et al. outline a solution that may be applied in addition to the previously 
cited mapping. This involves removing from the displayable image frequencies that 
would not have been resolvable by the world observer. This may be accomplished in 
the Fourier domain by removing frequencies higher than the threshold frequency 
for the world observer, as follows. 


t (Lwa) 
Lwa 


f*(we(Lwa)) = 


As with Ward’s contrast-based scale factor and Tumblin and Rushmeier’s operator, 
Ferwerda’s operator is based on psychophysical measurements, and is calibrated in 
SI units. An example of different pre-scaling factors for an uncalibrated image is 
shown in Figure 7.10, which reveals that for images that are scaled to small values 
the range of input values covers the scotopic range (in which vision is achromatic). 
For larger scale factors, the HDR data covers the mesopic and photopic ranges, 
where color is retained. Note that in this figure the change in visual acuity as a 
function of world adaptation level was not modeled. 


Fie GLOBAL OPERATORS 251 


Pre-scaling factors of 0.1, 1, 10, 33, 66, and 100 applied prior to tone map- 


ping with Ferwerda’s operator. 
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Ferwerda’s operator was later adapted by Durand and Dorsey [22] for the pur- 
pose of interactive tone reproduction. They also proposed various computationally 
efficient extensions that allow modeling the blue shift (associated with scotopic 
lighting conditions), light adaptation, and chromatic adaptation. 

Although Ferwerda’s operator is a linear scale factor (like Ward’s contrast-based 
scale factor), it models visual acuity and includes an achromatic component that 
models scotopic vision. It is therefore a more complete model than Ward’s. How- 
ever, it is still a linear model, which means that the maximum dynamic range that 
may be successfully tone mapped for display on an LDR device is limited. For very 
high dynamic range images, a nonlinear mapping may be a better approach. 


7.2.5 LOGARITHMIC AND EXPONENTIAL MAPPINGS 


Of all nonlinear mappings, logarithmic and exponential mappings are among the 
most straightforward. Their main use is in providing a baseline result against which 
all other operators may be compared. After all, any other operator is likely to be 
more complex and we may expect other operators to provide improved visual 
performance compared with logarithms and exponential mappings (although we 
would like to keep the notion of visual performance deliberately vague). 

For medium-dynamic-range images (i.e., images with a dynamic range some- 
what higher than can be accommodated by current LDR display devices), these 
very simple solutions may in fact be competitive with more complex operators. The 
logarithm is a compressive function for values larger than 1, and therefore range 
compression may be achieved by mapping luminances as follows. 

logo + Lw@, y)) 


La(x, y) = 75 
Se) tos oie Engi is 


A second mapping converts world luminances to display luminances by means of 
the exponential function [33], as follows. 


(7.6) 


Lax, y)=1 exp( ad >) 


Lay 


This function is bound between 0 for black pixels and 1 for infinitely bright pixels. 
Because world luminances are always smaller than infinity, the resulting display 
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Left: logarithmic mapping using Lmax. Right: exponential mapping using Lay. 


Compare with Figure 7.12, where the roles of maximum and average luminance are reversed. 


luminances L(x, y) will in practice never quite reach 1. Subsequent normalization 
would therefore somewhat expand the range of display values. 

The division by the average luminance Lay in the previous exponential func- 
tion causes pixels with this value to be mapped to | — 1/e ~ 0.63. Because this 
value is slightly above 0.5, the arithmetic average is employed rather than the more 
commonly used log average luminance. 

Figure 7.11 shows example results of the logarithmic and exponential mappings. 
Both images successfully map world luminances to display luminances. However, 
the logarithmic mapping produces an image that is somewhat dull. The exponential 
mapping, on the other hand, is overall much lighter. The original scene appeared to 
sit somewhere between these two renditions, and thus neither algorithm produced 
a displayable image that was faithful to the original scene. 

The differences between the images produced with logarithmic and exponential 
mappings could be due to the shape of the compression curve, but we also note 
that the logarithmic mapping is anchored to the maximum luminance value in 
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Wie iw Ww 


Left: logarithmic mapping using Lav. Right: exponential mapping using Lmax- 
Compare with Figure 7.11. 


the image, whereas the exponential mapping uses the average luminance value. 
Figure 7.12 shows the same two algorithms, but now we have swapped Lay and 
Lmax in both operators. This small change to these operators has also caused their 
appearance to be reversed. 

Plots of logarithmic and exponential mappings with either Lay or Lmax used 
to anchor the mapping are shown in Figure 7.13. The functional form, as well as 
the choice of anchor value, has a significant impact on the shape of the compres- 
sive function. Although more experimentation would be required to draw definite 
conclusions, it appears that the value chosen to anchor the mapping — Lmax or 
Ly — has the more profound effect on the result. 

In summary, logarithmic and exponential mappings are among the most 
straightforward nonlinear mappings. For images with a dynamic range that only 
just exceeds the capabilities of the chosen display device, these approaches may well 
suffice. For images with a higher dynamic range, however, other approaches may 
be more suitable. 
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0.9 Logarithmic using Lav 
Logarithmic using Lmax 
0.8 - Exponential using Lav 

Exponential using Lmax 


Plots of Equations 7.5 and 7.6 using Lay and Lmax. 


7.2.6 DRAGO LOGARITHMIC MAPPING 


Building upon the observation that the human visual system to a first approxima- 
tion uses a logarithmic response to intensities, Drago et al. show how logarithmic 
response curves may be extended to handle a wider dynamic range than the simple 
operators discussed in the preceding section [21]. 

The operator effectively applies a logarithmic compression to the input lumi- 
nances, but the base of the logarithm is adjusted according to each pixel’s value. 
The base is varied between 2 and 10, allowing contrast and detail preservation in 
dark and medium luminance regions while still compressing light regions by larger 
amounts. A logarithmic function of arbitrary base b may be constructed from log- 
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arithmic functions with a given base (for instance, base 10), as follows. 


logio) 
logjo(d) 
To smoothly interpolate between different bases, use is made of Perlin and Hoffert’s 


bias function. In this function, the amount of bias is controlled by user parameter p 
[99], as follows. 


log, (x) = 


bias, (x) = x!08(p)/log(0.5) 


The basic tone-reproduction curve is the same as the logarithmic mapping pre- 
sented in the preceding section, but with a base b (which is a function of each 
pixel’s luminance), as follows: 


log, (1 + Lw(x, y)) 


L = 
ae y) log, (1 + Limax) 


To smoothly interpolate between different bases, the preceding three equations are 
combined as follows. 


Lamax/100 logip( + Lw, y)) 
logio + Lw,max) logyo[2 + 8{( Fe G¥) ylogio(p)/log190-5)}] 


La, y)= 


The constants 2 and 8 bound the chosen base between 2 and 10. The maximum 
display luminance L¢,max is display dependent and should be specified by the user. 
In most cases, a value of 100 cd/m? would be appropriate. 

This leaves the bias parameter p to be specified. For many practical applications, 
a value between 0.7 and 0.9 produces plausible results, with a value of p = 0.85 be- 
ing a good initial value. Figure 7.14 shows an image created with different bias val- 
ues. The bias parameter steers the amount of contrast available in the tone-mapped 


For Drago’s logarithmic mapping, increasing values for the bias parameter p 
result in reduced contrast. In reading order, the bias parameter varied between 0.6 and 1.0 in 
increments of 0.1. 
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Drago’s logarithmic mapping. The discontinuity at Lw = 0.2 is most likely 
an artifact of our implementation. 


image in a well-controlled manner. Higher values result in less contrast and better 
compression, whereas smaller values increase the available contrast. 

A curve of this operator is plotted in Figure 7.15. The small discontinuity near 
Lw = 0.2 is in all likelihood an artifact of our implementation. 


7.2.7 REINHARD AND DEVLIN PHOTORECEPTOR 
MODEL 


Logarithmic compression may be viewed as effectively computing a density image. 
The output therefore resembles the information stored in a negative. Although this 
metaphor holds, logarithmic responses are also sometimes attributed to parts of 
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the human visual system. This is intrinsically incorrect, although the human visual 
system responds approximately logarithmically over some of its operating range (see 
Section 6.2). Cells in the human visual system communicate with impulse trains, 
wherein the frequency of these impulse trains carries the information. Notable ex- 
ceptions are the first few layers of cells in the retina, which communicate by gen- 
erating graded potentials. In any case, this physiological substrate does not enable 
communication of negative numbers. The impulse frequency may become zero, 
but there is no such thing as negative frequencies. There is also an upper bound to 
realizable impulse frequencies. 

Logarithms, on the other hand, may produce negative numbers. For large input 
values, the output may become arbitrarily large. At the same time, over a range of 
values the human visual system may produce signals that appear to be logarithmic. 
Outside this range, responses are no longer logarithmic but tail off instead. A class of 
functions that approximates this behavior reasonably well are sigmoids, or S-shaped 
functions, as discussed in Chapter 6. When plotted on a log-linear graph, the middle 
portion of such sigmoids is nearly linear and thus resembles logarithmic behavior. 
Moreover, sigmoidal functions have two asymptotes: one for very small values and 
one for large values. 

This gives sigmoidal functions the right mathematical properties to be a possible 
candidate for modeling aspects of the human visual system. Evidence from electro- 
physiology confirms that photoreceptors of various species produce output voltages 
as a function of light intensity received that may be accurately modeled by sigmoids. 

Naka and Rushton were the first to measure photoreceptor responses, and man- 
aged to fit a sigmoidal function to their data [87]. For the purpose of tone repro- 
duction, the following formulation by Hood et al. is practical [52]. 


T(x, y) 


V(x, y)= I(x, y)+0o(h(x, y)) 


Here, / is the photoreceptor input, V is the photoreceptor response, and o is the 
semisaturation constant (which is a function of the receptor’s adaptation level /,). 
The semisaturation constant thus determines to which value of V the adaptation 
level is mapped, and therefore provides the flexibility needed to tailor the curve to 
the image being tone mapped. For practical purposes, the semisaturation constant 
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may be computed from the adaptation value J, as follows. 


CLG YSU Læ, y)” 


In this equation, f and m are user parameters that need to be specified on a per- 
image basis. The scale factor f may be used to steer the overall luminance of the 
tone-mapped image and can initially be estimated as 1. Images created with different 
values of f are shown in Figure 7.16. 

It should be noted that in electrophysiological studies the exponent m also fea- 
tures and tends to lie between 0.2 and 0.9 [52]. A reasonable initial estimate for 
m may be derived from image measures such as the minimum, maximum, and 
average luminance, as follows. 


m=0.3+0.7k!4 


k = Lmax = Lay 


Lmax = Lmin 


The parameter k may be interpreted as the key of the image (i.e., a measure of how 
light or dark the image is on average). The nonlinear mapping from k to exponent m 
is determined empirically. The exponent m is used to steer the overall impression 
of contrast, as shown in Figure 7.17. 

A tone-reproduction operator may be created by equating display values to the 
photoreceptor output V, as demonstrated by Reinhard and Devlin [108]. Note that 
this operator is applied to each of the red, green, and blue color channels separately. 
This is similar to photoreceptor behavior, in which each of the three different cone 
types is thought to operate largely independently. Also note that sigmoidal functions 
that are part of several color appearance models — such as the Hunt model [55], 
CIECAM97 [54], and CIECAM02 [84] (see Section 2.8) —are executed indepen- 
dently to the red, green, and blue channels. This approach may account for the 
Hunt effect, which predicts desaturation of colors for both light and dark pixels, 
but not for pixels with intermediate luminances [55]. 

The adaptation level J, may be computed in traditional fashion, for instance, as 
the (log) average luminance of the image. However, additional interesting features, 
such as light adaptation and chromatic adaptation, may be modeled by a slightly 
more elaborate computation of J,. 
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Luminance control with user parameter f in Reinhard and Devlin’s photorecep- 
tor-based operator. User parameter f, set here to exp(—8), exp(O) = 1 and exp(8). The 
top right-hand image shows the default value. 


Strong color casts may be removed by interpolating between the luminance value 
L(x, y) of the pixel and the red, green, and blue values of each pixel Zy glb(x, y). 
This produces a different adaptation level for each pixel individually, which is con- 
trolled by a user-specified interpolation weight c, as follows. 


R, y) = Chg, y) += oLa, y) 
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FIGURE 7.17 The exponent m in Reinhard and Devlin’s operator. Images are shown with 
exponent m set to 0.6, 0.7, and 0.8 (in reading order). 


This approach achieves a von Kries style of color correction by setting c equal to 
1, whereas no color correction is applied if c equals 0. We also call this color ad- 
justment “chromatic adaptation.” Its effect is shown in Figure 7.18 for three values 
ofc. 

Similarly, the adaptation level (see Figure 7.19) may be thought of as determined 
by the current light level to which a receptor is exposed, as well as levels to which 
the receptor was exposed in the recent past. Because the eye makes saccadic eye 
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Simulation of chromatic adaptation in Reinhard and Devlin’s photoreceptor-based 


operator. The level of chromatic adaptation may be approximated by setting user parameter c 
(shown here with values of 0.0, 0.5, and 1.0). 


movements, and because there is the possibility of lateral connectivity within the 
retina, we may assume that the current adaptation level is a function of the pixel 
value itself and all other pixels in the image. This has given rise to all manner of 
spatially varying tone-reproduction models (see Section 7.3), but here a much faster 
and simpler solution is used (namely, interpolation between pixel values and global 
averages), as follows. 


h; y) = Alri, y) +(1—a) rlalb 
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FIGURE 7.19 Simulation of light adaptation in Reinhard and Devlin’s operator. The level of 
light adaptation may be approximated by setting user parameter a to 0, 1/3, 2/3, and 1. 
(Image courtesy of the Albin Polasek Museum, Winter Park, Florida.) 


Fie GLOBAL OPERATORS 265 


FIGURE 7.20 Luminance mapping by the photoreceptor-based operator for different values of 
user parameter a. 


The interpolation weight a is user specified and controls image appearance, which 
to some extent correlates with light adaptation. Plots of the operator for different 
values of a are presented in Figure 7.20. Light adaptation and chromatic adaptation 
may be combined by bilinear interpolation, as follows. 


poral (yy) = chg, y) +A- Lx, y) 


geo Lë Hi Ez (1 _ LY 


I(x, y) =a Lx, y) + (1 — a) 18 
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This operator is directly inspired by photoreceptor physiology. Using default param- 
eters, it provides plaussible results for a large class of images. Most other results may 
be optimized by adjusting the four user parameters. The value of c determines to 
what extent any color casts are removed, a and m affect the amount of contrast 
in the tone-mapped image, and f’ make the overall appearance lighter or darker. 
Because each of these parameters has an intuitive effect on the final result, manual 
adjustment is fast and straightforward. 


7.2.8 WARD HISTOGRAM ADJUSTMENT 


Most global operators define a parametric curve with a few parameters that are 
estimated from the input image or that need to be specified by the user. Histogram 
enhancement techniques provide a mechanism for adjusting the mapping in a more 
fine-grained, albeit automatic, manner. Image enhancement techniques manipulate 
images that are already LDR to maximize visibility or contrast. On the other hand, 
Ward et al. borrow key ideas from histogram enhancement techniques to reproduce 
HDR images on LDR displays, simulating both visibility and contrast [142]. Their 
technique is termed histogram adjustment. 

The simulation of visibility and contrast serves two purposes. First, the subjective 
correspondence between the real scene and its displayed image should be preserved 
so that features are only visible in the tone-mapped image if they were also visible 
in the original scene. Second, the subjective impression of contrast, brightness, and 
color should be preserved. 

The histogram adjustment operator computes a histogram of a density image 
(i.e., the log of all pixels taken first) to assess the distribution of pixels over all pos- 
sible luminance values. The shape of its associated cumulative histogram may be 
directly used to map luminance values to display values. However, further restric- 
tions are imposed on this mapping to preserve contrast based on the luminance 
values found in the scene and on how the human visual system would perceive 
those values. As a postprocessing step, models of glare, color sensitivity, and visual 
acuity may further simulate aspects of human vision. 

The histogram is calculated by first downsampling the image to a resolution 
that corresponds roughly to 1 degree of visual angle. Then the logarithm of the 
downsampled image is taken and its histogram is computed. The minimum and 
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maximum log luminance values are taken to define the range of the histogram, 
with the exception that if the minimum log luminance value is smaller than —4 
this value is used as the lower bound of the histogram. This exception models the 
lower threshold of human vision. The number of bins in the histogram is 100, 
which in practice provides a sufficiently accurate result. If f (b;) counts the number 
of pixels that lie in bin b;, the cumulative histogram P (b), normalized by the total 
number of pixels T, is defined as 


P(b) = D> fbi)/T 


bj <b 


ra > fbi). 
bi 


A naïve contrast equalization formula may be constructed from the cumulative his- 
togram and the minimum and maximum display luminances, as follows. 


log(La(x, y)) = log(La min) + (ogg wae) = log(La,min)) P (log LyX, y)) 


This approach has a major flaw in that wherever there is a peak in the histogram, 
contrasts may be expanded rather than compressed. Exaggeration of contrast is 
highly undesirable and is avoidable through the following refinement. Based on 
the observation that linear tone mapping produces reasonable results for images 
with a limited dynamic range, contrasts due to histogram adjustment should not 
exceed those generated by linear scaling. That is, 


dL L 
dla la 
dLy Lw 
Because the cumulative histogram is the numerical integration of the histogram, 


we may view the histogram itself as the derivative of the cumulative histogram— 
provided it is normalized by T and the size of a bin 6b is small, and thus 


dP(b) f(b) 


db  Tôb 


1 
bb= N log(Lmax) — log(Lmin). 
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This naive histogram equalization gives an expression for the display luminance La 
as a function of world luminance Lw. Its derivative may therefore be plugged into 
the previous inequality to yield a ceiling on f(b), as follows. 


L f (og(Lw)) log(La,max) = log(La,min) _ La 
a on Ly E 


Tôb 
log(La,max) = log(La min 


b 
y= FO) 


This means that as long as f(b) does not exceed this ceiling, contrast will not be 
exaggerated. For bins with a higher pixel count, the simplest solution is to trun- 
cate f(b) to the ceiling. Unfortunately, this changes the total pixel count T in the 
histogram, which by itself will affect the ceiling. This may be solved by an iterative 
scheme that stops if a certain tolerance is reached. Details of this approach are given 
in [142]. 

A second refinement is to limit the contrast according to human vision. The 
linear ceiling described previously assumes that humans detect contrast equally well 
over the full range of visible luminances. This assumption is not correct, prompting 
a solution that limits the contrast ceiling according to a just-noticeable difference 
function 6L;. This function takes an adaptation value La as a parameter, as follows. 
(This is the same as the function used by Ferwerda’s model of visual adaptation. See 
Section 7.2.4.) 


—2.86 for log,g(La) < —3.94 

(0.405 logjo(La) + 1.6)7:!8 — 2.86 for —3.94 < log (La) < — 1.44 
dL, (La) = $ logi9(La) — 0.395 for — 1.44 < logio (La) < —0.0184 

(0.249 log, (La) + 0.65)?’ — 0.72 for —0.0184 < log (La) < 1.9 

logi9(La) — 1.255 for logjo(La) = 1.9 
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Example images tone mapped with the histogram adjustment operator. The map- 


pings produced for these images are plotted in Figure 7.22. 


This yields the following inequality and ceiling on f(b), which also requires an 
iterative scheme to solve. 
dLa _ dLi(La) 
dLw ~ 5L,(Lw) 
ôL (L TôbL 
f(b) < (La) w 
bLi(Lw) (logio(La,max) — log10(La,min)) La 


The result is a practical hands-off tone-reproduction operator that produces plausi- 
ble results for a wide variety of HDR images. Because the operator adapts to each 
image individually, the mapping of world luminance values to display values will 
be different for each image. As an example, two images are shown in Figure 7.21. 
The mappings for these two images are shown in Figure 7.22. 

Further enhancements model human visual limitations such as glare, color sen- 
sitivity, and visual acuity. Veiling glare is caused by bright light sources in the pe- 
riphery of vision, which cause light scatter in the ocular media. Light scatter causes 
a reduction of contrast near the projection of the glare source on the retina. 
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log Lw 


Mapping of world luminances to display luminance for the images shown in 
Figure 7.21. 


In dark environments, color sensitivity is lost because only one type of receptor 
is active. In brighter environments, the three cone types are active and their relative 
activity is used by the human visual system to infer the spectral composition of the 
scene it is viewing. Finally, in dark environments visual acuity is lost because only 
very few rods are present in the fovea. 

The histogram adjustment technique may accommodate each of these effects, 
and we refer to Ward’s original paper for a full description [142]. Figure 7.23 shows 
a daytime image processed with the various options afforded by this operator, and 
Figure 7.24 shows the same applied to a nighttime image. 
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Histogram adjustment with its various simulations of human visual limitations 
for a daylight scene. In reading order: histogram adjustment, histogram adjustment with simula- 
tion of visual acuity loss, veiling glare, color sensitivity, and contrast sensitivity. The final image 
shows a combination of all of these. Compare with Figure 7.24, which shows the same techniques 
applied to a nighttime image. 
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Histogram adjustment with its various simulations of human visual limitations 


for a night scene. In reading order: histogram adjustment, histogram adjustment with simulation 
of visual acuity loss, veiling glare, color sensitivity, and contrast sensitivity. The final image shows 
a combination of all of these. 
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7.2.9 SCHLICK UNIFORM RATIONAL QUANTIZATION 


Uniform rational quantization is aimed at providing an improved tone-reproduction 
operator as compared with simple logarithmic mappings and ad hoc procedures 
(such as gamma correction) for the purpose of dynamic range reduction. It was 
not developed to be an alternative to more complete perceptually-based operators. 
However, this method does provide a simple scheme with only two user parameters. 

A rational function is defined as the quotient of two polynomials. The specific 
mapping function proposed by Schlick is [113] as follows. 

PLw@, y) 


La(x, y) = where p € [1, co) 
(p— Dike (x, 9+ Lives P 


This function bears some resemblance to sigmoidal functions (see also Sec- 
tion 6.3.1), although instead of a semisaturation constant the maximum world lu- 
minance Lmax is used, and instead of an exponent to control the overall appearance 
a scale factor p is introduced. 

The value of p may be estimated such that the smallest value that is not black 
remains just visible after tone mapping. This JND ôLọ in quantized display lumi- 
nance steps should be specified by the user. A simple way of determining this value 
is to show an image with various patches on a black background. The patches will 
vary in gray level. The user then selects the darkest patch that is just visible. The 
parameter p may then be approximated by 


p= Bho, Lmax 
N Lmin , 


where N is the number of different luminance levels that can be reproduced by the 
display device. For 8-bit display devices, its value will be 256. Figure 7.25 shows 
results created with different values of Lo. A plot of this operator is shown in 
Figure 7.26. An empirically determined refinement to the previous uniform rational 
quantization scheme uses the following slightly different form, which also depends 
on the pixel’s luminance value. 


ôLo Lmax ( Ly(x, y) ) 
N Lmin N Lmin Lmax 
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Using Schlick’s uniform rational quantization, the just-noticeable difference from 
black was set to 1077, 1076, 107°, and 1074. 


Fie GLOBAL OPERATORS 275 


Schlick’s uniform rational quantization shown for a value of Lo = 1. 


Schlick’s original intent was to extend the uniform rational quantization function to 
be spatially varying. The pixel’s luminance value in this formulation would then be 
replaced by a weighted average of the pixel’s luminance and its neighbors. However, 
he found that the best results were obtained by making the local neighborhood no 
larger than the pixel itself: This yields the previous formulation, which is no longer 
spatially varying. 

The user parameter k should be specified in the range [0,1]. Its effect on a tone- 
mapped image is shown in Figure 7.27. 

Schlick’s operator produces plausible results and is computationally efficient. 
However, it may be somewhat difficult to find values for the two user parameters 
without some experimentation. 
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User parameter k of Schlick’s uniform rational quantization varied between 0.0 
and 1.0 in steps of 0.2. 
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7.3 LOCAL OPERATORS 


Global operators are characterized by a mapping of world luminances to dis- 
play luminances that is identical for all pixels (i.e., a single tone-mapping curve 
is used throughout the image). This makes them computationally efficient, but 
there is a limit to the dynamic range of the input image beyond which success- 
ful compression becomes difficult. Global operators are of necessity monotoni- 
cally increasing operators. Otherwise, visually unpleasant artifacts will be intro- 
duced. Because display devices can usually not accommodate more than 256 lev- 
els, all world luminances must be mapped to that range and quantized to unit in- 
crements. The higher the dynamic range of an image, the more values must be 
mapped to 256 different numbers by a monotonically increasing function. For ex- 
treme HDR images this will almost inevitably lead to loss of visibility or contrast, or 
both. 

Thus, global operators are limited in their capacity to compress HDR images. To 
some extent this limit may be lifted by local operators by compressing each pixel 
according to its luminance value, as well as to the luminance values of a set of 
neighboring pixels. Thus, instead of anchoring the computation to a globally de- 
rived quantity (such as the image’s log average value) for each pixel the computation 
is adjusted according to an average over a local neighborhood of pixels. 

Local operators more often than not mimic features of the human visual system. 
For instance, a reasonable assumption is that a viewer does not adapt to the scene 
as a whole, but to smaller regions instead. An active observer’s eyes tend to wander 
across the scene, focusing on different regions. For each focus point, there is a 
surrounding region that helps determine the state of adaptation of the viewer. 

For tone-reproduction operators this has the implication that we may be able 
to compute an adaptation level individually for each pixel by considering the pixel 
itself and a set of neighboring pixels. Classic problems to be solved by local tone- 
reproduction operators are to determine how many neighboring pixels need to be 
included in the computation, how to weight each neighboring pixel’s contribution 
to the local adaptation level, and how to use this adaptation level within a compres- 
sive function. These issues are solved differently by the operators described in this 
section. 
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7.3.1 CHIU SPATIALLY VARIANT OPERATOR 


The first to observe that a spatially varying operator may be useful for tone repro- 
duction were Chiu et al. [9]. They noted that artists frequently make use of spatially 
varying techniques to fool the eye into thinking that a much larger dynamic range 
is present in artwork than actually exists. In particular, the areas around bright fea- 
tures may be dimmed somewhat to accentuate them. The basic formulation of their 
operator, as follows, multiplies each pixel’s luminance by a scaling factor s(x, y), 
which depends on the pixel itself and its neighbors. 


La(x, y) =s(x, y)Lw(x, y) 


For s(x, y) to represent a local average, we may produce a low-pass filtered version 
of the input image. Chiu et al. note that most low-pass filters produce similar results. 
For demonstration purposes, we show the technique with a Gaussian filter with a 
width controlled by a user parameter (see Section 7.1.4). 

In a blurred image, each pixel represents a weighted local average of the pixel 
in the corresponding position in the input image. The reciprocal of these blurred 
pixels may be used to compress HDR images, as follows. 


Lax, y) = Lyw(x, y) 


ELEM (x, y) 


Here, k is a constant of proportionality (a user parameter) that controls the weight 
given to the blurred image relative to the unblurred input. This approach immedi- 
ately highlights one of the main problems faced by all local operators: halos arise 
around bright features. These halos, or contrast reversals, are more often disturb- 
ing than helpful. However, at the same time we have argued that artists use such 
dimming of areas around bright objects with great success. We conclude that some 
halos are good and some are bad. Finding a spatially variant tone-reproduction op- 
erator that does not produce obtrusive halos is a challenge. In our opinion, some 
operators succeed better than others. 

To illustrate the haloing problem, we created a series of images (using the pre- 
vious formulation) with different values of k, shown in Figure 7.28. This places 
more or less weight on the Gaussian blurred image, which was chosen with a ker- 
nel size of 128 pixels in each case. In that k controls the relative contribution of the 
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FIGURE 7.25 The relative weight of the Gaussian-blurred image was controlled by specifying 
user parameter k, which was given values of 1, 2, 4, 8, and 16. This parameter thus varies the 
strength of the halo. 
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Effect of kernel size on image quality in Chiu’s operator. The width of the 
Gaussian kernel was varied from 4 pixels to 128 pixels, doubling its width with each consec- 
utive image. 


7.3 LOCAL OPERATORS 281 


Gaussian blurred image to the final result, it affects the strength of the halos and the 
amount of achievable compression. 

Whereas k controls the depth of the Gaussian, the kernel size may also be varied 
by a user parameter. Its effect is shown in Figure 7.29. The extreme haloing effect 
seen for small Gaussians starts to disappear for larger Gaussians. The image created 
with a kernel size of 128 pixels looks plausible. Although the halos in the bottom 
right-hand image of Figure 7.29 are not absent, we believe that here the transition 
between bad halos and good halos can be seen. This figure is in agreement with the 
observation made by Chiu et al. that wide filter kernels need to be used for local 
operators of this form to produce plausible results. 

Chiu’s original implementation included a smoothing stage that would iterate at 
least 1,000 times over the image with a small filter kernel. This would somewhat 
reduce the effect of contrast reversals. This approach is too expensive to be practical, 
however, and we therefore did not include this stage in our experimentation. 

Chiu’s work is intended to be exploratory and is of interest because it highlights 
the issues faced by other local tone-reproduction operators. Dependent on the ap- 
plication, halos may be desirable or completely undesirable. In any case, contrast 
reversals are a feature of most spatially varying operators. The extent to which they 
are visible depends on the method chosen and the amount of parameter tuning 
applied. 


7.3.2 RAHMAN RETINEX 


Whereas Chiu’s work is exploratory and is not advertised as a viable tone- 
reproduction operator, Rahman and Jobson developed their interpretation of the 
retinex theory for use in various applications, including tone reproduction [59, 
102,103]. However, the differences between their approach and Chiu’s are rela- 
tively minor. They too divide the image by a Gaussian-blurred version with a wide 
filter kernel. 

Their operator comes in two different forms: single-scale and multiscale. In the 
single-scale version, Chiu’s model is followed closely, although the algorithm oper- 
ates in the log domain. However, the placements of the logarithms are somewhat pe- 
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culiar (namely, after the image is convolved with a Gaussian filter kernel), as follows. 


Ta(x, y) = exp(log(Iw(x, y)) — klog(12""(x, y))) 


This placement of logarithms is empirically determined to produce visually im- 
proved results. We add exponentiation to the results to return to a linear image. 

Note that this operator works independently on the red, green, and blue chan- 
nels, rather than on a single luminance channel. This means that the convolution 
that produces a Gaussian-blurred image needs to be repeated three times per image. 

In the multiscale retinex version, this equation is repeated several times for Gaus- 
sians with different kernel sizes. This results in a stack of images, each image blurred 
by increasing amounts. In the following, an image at level n will be denoted p 
In the examples we show in this section, we used a stack of six levels and made the 
smallest Gaussian filter kernel two pixels wide. Fach successive image is convolved 
with a Gaussian twice as large as that of the previous image in the stack. 

The multiscale retinex version is then simply the weighted sum of a set of single- 
scale retinexed images. The weight given to each scale is determined by the user. 
We have found that for experimentation it is convenient to weight each level by a 
power function, which gives straightforward control over the weights. For an image 
stack with N levels, the normalized weights are then computed by 


(N-n—-1)f 
EAN -m-f 


m 


Wn = 


A family of curves of this function is plotted in Figure 7.30. The user parameter 
f determines the relative weighting of each of the scales. For equal weighting, f 
should be set to 0. To give smaller scales more weight, f should be given a positive 
value (such as 0.1 or 0.2). If the larger scales should be emphasized, f should be 
given negative values. The multiscale retinex takes the following form. 


N 


la(x, y) = exp( X wn(log(Iw(x, y)) — klog( 126 (x, »)) 


n=0 


The two user parameters are k and f, which are in many ways equivalent to the user 
parameters required to control Chiu’s operator. The value of k specifies the relative 
weight of the blurred image. Larger values of k will cause the compression to be 
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IGURE 7.30 Weight factors w, as function of scale n for different user parameters f. The 
values for f used a range from —0.3 to 0.3 in steps of 0.1. 


more dramatic, but also create bigger halos. Parameter f, which controls the relative 
weight of each of the scales, determines which of the Gaussian-blurred images 
carries the most importance. This is more or less equivalent to setting the spatial 
extent of the Gaussian in Chiu’s method. With these two parameters we therefore 
expect to be able to control the operator, balancing amount of compression against 
severity of the artifacts. This is indeed the case, as Figures 7.31 and 7.32 show. 

In summary, Rahman and Jobson’s interpretation of Land’s retinex theory is simi- 
lar to the exploratory work by Chiu. There are three main differences. The algorithm 
works in the log domain, which causes contrasts at large image values to lie closer 
together. This generally results in fewer issues with haloing. Second, the algorithm 
operates on the three color channels independently. This approach is routinely fol- 
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In Rahman’ retinex implementation, parameter k controls the relative weight of 
the Gaussian-blurred image stack. Here, k is varied from 0.0 to 0.5, and 1.0 (in reading order). 


In Rahman's retinex implementation, the Gaussian-blurred images may be 
weighted according to scale. The most important scales are selected with user parameter f, which 
is varied between —2 to 2. Equal weight is given for a value of f = 0, shown on the left of the 
middle row. 
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lowed by various color appearance models (see, for instance, the CIECAM02 model 
discussed in Section 2.8, and the iCAM model discussed in material following). Fi- 
nally, this work operates on multiple scales that are weighted relative to one another 
by a user-specified parameter. Multiscale techniques are well known in the litera- 
ture, including the tone-reproduction literature. Other examples of multiscale tech- 
niques are the multiscale observer model, Ashikhmin’s operator, and photographic 
tone reproduction, described respectively in Sections 7.3.4, 7.3.5, and 7.3.6. 


7.3.3 FAIRCHILD iCAM 


Although most operators discussed in this chapter are aimed at dynamic range re- 
duction, Pattanaik’s multiscale observer model [94] (discussed in the following sec- 
tion) and Fairchild’s iCAM model [30] are both color appearance models. 

Most color appearance models — such as CIECAM97, CIECAMO2 and the Hunt 
model — are intended for use in simplified environments. It is normally assumed 
that a uniform patch of color is viewed on a larger uniform background with a 
different color. The perception of this patch of color may then be predicted by these 
models with the XYZ tristimulus values of the patch and a characterization of its 
surround as input, as described in Section 2.8. 

Images tend to be more complex than just a patch on a uniform background. The 
interplay between neighboring pixels may require a more complex spatially variant 
model that can account for the local adaptation of regions around each pixel. This 
argument in favor of spatially variant color appearance models is effectively the same 
as the reasoning behind spatially variant tone-reproduction operators. The parallels 
between the iCAM model described here and operators such as Chiu’s and Rahman’s 
are therefore unmistakable. However, there are also sufficient differences to make a 
description of the model worthwhile. 

The iCAM image appearance model is a direct refinement and simplification of 
the CIECAM02 color appearance model [30,61 ]. It omits the sigmoidal compression 
found in CIECAM02 but adds spatially variant processing in the form of two separate 
Gaussian-blurred images that may be viewed as adaptation levels. Like most color 
appearance models, the model needs to be applied in the forward direction and in 
the reverse direction. 
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The input to the model is expected to be specified in XYZ device-independent 
coordinates. Like CIECAM02, the model uses various color spaces to execute the 
stages of the algorithm. The first stage is a chromatic adaptation transform, for 
which sharpened cone responses are used. Sharpened cone responses are obtained 
with the Mcaro2 transform, given in Section 2.4. 

The chromatic adaptation transform pushes the colors in the image toward the 
Des white point. The amount of adaption in this von Kries transform is determined 
by a user parameter D, which specifies the degree of adaptation. In addition, for 
each pixel a white point W(x, y) is derived from the XYZ image by applying a low- 
pass filter with a kernel a quarter the size of the image. This may be applied to each 
color channel independently for chromatic adaptation, or on the Y channel only 
for achromatic adaptation. This low-pass filtered image is then also converted with 
the Mcaro2 matrix. Finally, the Des white point — given by the Yw = 95.05, 100.0, 
108.88 triplet —is also converted to sharpened cone responses. The subsequent von 
Kries adaptation transform is given by the following. 


i D 
Re(x, y) = R'@, (na Jie D) 
dayde, (Kee ii D) 
Wea (x, y) 
B(x, y) = B(x, (rege die D) 
Wp (x, y) 


This transform effectively divides the image by a filtered version of the image. This 
step of the iCAM model is therefore similar to Chiu’s and Rahman’s operators. In 
those operators, the trade-off between amount of available compression and pres- 
ence of halos is controlled by a scaling factor k. Here, D plays the role of the scaling 
factor. We may therefore expect this parameter to have the same effect as k in Chiu’s 
and Rahman’s operators. However, in the previous equation D also determines the 
amount of chromatic adaptation. It serves the same role as the degree of adaptation 
parameter found in other color appearance models (compare, for instance, with 
CIECAM02, described in Section 2.8). 
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For larger values of D, the color of each pixel is pushed closer to the De5 white 
point. Hence, in the iCAM model the separate issues of chromatic adaptation, halo- 
ing, and amount of compression are directly interrelated. 

Figure 7.33 shows the effect of parameter D, which was given values of 0.0, 
0.5, and 1.0. This figure also shows the effect of computing a single white point 
shared between the three values of each pixel and computing a separate white point 
for each color channel independently. For demonstration purposes, we have chosen 
an image with a higher dynamic range than usual. The halo visible around the 
light source is therefore more pronounced than for images with a medium dynamic 
range. Like Chiu’s and Rahman’s operators, the iCAM model appears most suited for 
medium-dynamic-range images. 


The iCAM image appearance model. Top row: luminance channel used as adap- 
tation level for all three channels. Bottom row: channels are processed independently. From left to 
right the adaptation parameter D was varied from 0.0 to 0.5 and 1.0. 
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After the chromatic adaptation transform, further compression is achieved by an 
exponential function executed in LMS cone space (see Section 2.4). The exponential 
function that compresses the range of luminances is given by the following. 


L'(x, y) = |L, y) PP" 


M'(x, y) = |Ma, y) OE 


Sy) = [se 


The exponent is modified on a per-pixel basis by Fy, which is a function of a 
spatially varying surround map derived from the luminance channel (Y channel) 
of the input image. The surround map S(x, y) is a low-pass filtered version of this 
channel with a Gaussian filter kernel size of one-third the size of the image. The 
function F; is then given by the following. 


1 1 4 
Foy) =a (02(—) (5S(x, y)) 


1 4 < 
rofi- (san) ) 1550.9) 


Thus, this computation of Fy, may be seen as the spatially variant extension of 
CIECAM02’s factor for partial adaptation, given in Equation 2.1. 


This step completes the forward application of the iCAM model. To prepare the 
result for display, the inverse model should be applied. The model requires the same 
color spaces to be used as in the forward model in each of the steps. The first step 
is to invert the previous exponentiation, as follows. 


1/0.43 
L(x, y) =|L(x, y|" 

1/0.43 
M'(x, y) =|M(z, y)| 

1/0.43 


S'(x, y) = |S, y)| 


The inverse chromatic adaptation transform does not require a spatially variant 
white point, but converts froma global Des white point Yw = 95.05, 100.0, 108.88 
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to an equiluminant white point Ye = 100,100,100. Because full adaptation is as- 
sumed, D is set to 1 and this transform simplifies to the following scaling, which 
is applied in sharpened cone response space. 


R' = RÆ 
WwW 
Gaee* 
Yy 
Bi = Blt 
Yy 


After these two steps are executed in their appropriate color spaces, the final steps 
consist of clipping the top 99% of all pixels, normalization, and gamma correction. 
The user parameters for this model are D, as discussed previously, and a prescaling 
of the input image. This prescaling may be necessary because the iCAM model 
requires the input to be specified in cd/m?. For arbitrary images, this requires the 
user to scale the image to its appropriate range prior to tone mapping. The effect 
of pre-scaling is shown in Figure 7.34. For images that contain values that are too 
small, a red shift is apparent. If the values in the image are too large, the overall 
appearance of the image becomes too dark.’ 

Further parameters for consideration are the kernel sizes of the two Gaussian 
filters. For the images shown in this section, we used the recommended kernel sizes 
of 1/4 and 1/3 the size of the image, but other sizes are possible. As with Chiu’s 
and Rahman’s operators, the precise kernel size is unimportant, as long as the filter 
width is chosen to be large. 


Effect of pre-scaling on the iCAM model. The factor used for the top left-hand 
image was 0.01 and each subsequent image was scaled with a factor 10 times larger than the 
previous image. 


3 The images in this figure, as with similar image sequences for other operators, were scaled beyond a reasonable range — 
too small and too large — to show the effect of the parameter. It should be noted that in practice a reasonable parameter 
setting should be chosen to avoid such extremes. 
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In summary, the iCAM model consists of two steps: a chromatic adaptation step 
followed by an exponential function. The achromatic adaptation step strongly re- 
sembles Chiu’s and Rahman’s operators because the image is divided by a blurred 
version of the image. The second step may be viewed as an advanced form of gamma 
correction, whereby the gamma factor is modulated on a per-pixel basis. The for- 
ward model needs to be followed by the inverse application of the model to prepare 
the image for display. A final clipping and normalization step brightens the overall 
appearance. The model is best suited for images with a medium dynamic range, in 
that the trade-off between compression and presence of halos is less critical for this 
class of images than for extreme HDR images. 


7.3.4 PATTANAIK MULTISCALE OBSERVER MODEL 


Pattanaik’s multiscale observer model ranks among the more complete color appear- 
ance models and consists of several steps executed in succession [94]. The output 
of this model (and all other color appearance models) are color appearance cor- 
relates, as discussed in Section 2.8. A tone-reproduction operator may be derived 
from these correlates by executing the inverse model and substituting characteristics 
of the display device into the equations in the appropriate place. 

For simplicity, we present a version of the model that is reduced in complexity. 
For the purpose of tone reproduction, some of the forward and backward steps of 
the model cancel out and may therefore be omitted. In addition, compared to the 
original model we make small changes to minimize visual artifacts, for instance by 
choosing the filter kernel sizes smaller than in the original model. We first give a 
brief overview of the full model and then detail a simplified version. 

The first step in the forward model is to account for light scatter in the ocu- 
lar media, followed by spectral sampling to model the photoreceptor output. This 
yields four images representing the rods and the L, M, and S cones. These four 
images are then each spatially decomposed into seven-level Gaussian pyramids and 
subsequently converted into four six-level difference-of-Gaussian (DoG) stacks that 
represent bandpass behavior as seen in the human visual system. DoGs are com- 
puted by subtracting adjacent images in the pyramid. 

The next step consists of a gain control system applied to each of the DoGs 
in each of the four channels. The shape of the gain control function resembles TVI 
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curves such that the results of this step may be viewed as adapted contrast pyramidal 
images. The cone signals are then converted into a color opponent scheme that 
contains separate luminance, red-green, and yellow-blue color channels. The rod 
image is retained separately. 

Contrast transducer functions that model human contrast sensitivity are then 
applied. The rod and cone signals are recombined into an achromatic channel, as 
well as red-green and yellow-blue color channels. A color appearance map is formed 
next, which is the basis for the computation of the aforementioned appearance 
correlates. This step cancels in the inverse model, and we therefore omit a detailed 
description. We also omit computing the rod signals because we are predominantly 
interested in photopic lighting conditions. 

The model calls for low-pass filtered copies with spatial frequencies of 0.5, 1, 
2, 4, 8, and 16 cycles per degree (cpd). Specifying spatial frequencies in this man- 
ner is common practice when modeling the human visual system. However, for a 
practical tone-reproduction operator this would require knowledge of the distance 
of the observer to the display device and the spatial resolution of the display device. 
Because viewer distance is difficult to control, let alone anticipate, we restate spatial 
frequencies in terms of cycles per pixel (cpp). 

Further, we omit the initial modeling of light scatter in the ocular media. Model- 
ing light scatter would have the effect of introducing a small amount of blur in the 
image, particularly near areas of high luminance. On occasion, modeling of glare 
may be important and desirable and should be included in a complete implementa- 
tion of the multiscale observer model. However, for simplicity we omit this initial 
processing. This set of simplifications allows us to focus on the part of the multiscale 
observer model that achieves dynamic range reduction. 

The model expects input to be specified in LMS cone space, discussed in Sec- 
tion 2.4. The compressive function applied in all stages of the multiscale observer 
model is given by the following gain control. 


1 


G(L) = ————__ 
a 0.555(L + 1)9-85 


Multiplying either a low-pass or bandpass image by this gain control amounts to 
applying a sigmoid. Using the techniques presented in Section 7.1.4, a stack of 
seven increasingly blurred images is created next. The amount of blur is doubled 
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at each level, and for the smallest scale we use a filter kernel the size of which is 
determined by a user parameter (discussed later in this section). An image at level 
s is represented by the following triplet. 


(L=, y), M (x, y), SC, y)) 


From this stack of seven Gaussian-blurred images we may compute a stack of six 
DoG images that represent adapted contrast at six spatial scales, as follows. 


LPS (x, y) = (L, y)— pe (x y))G(Ln (x, y)) 


MPS (x, y) = (ME (x, y) — M, y))G(MPNT(x, y)) 


SPSC, y) = (S, y) — SD) GIS a) 


The DoG scheme involves a division by a low-pass filtered image (through the gain 
control function), which may be viewed as a normalization step. This approach was 
followed in both Ashikhmin’s operator (see following section) and in the photo- 
graphic tone-reproduction operator (Section 7.3.6). DoGs are reasonable approxi- 
mations of some of the receptive fields found in the human visual system.* They 
are also known as center-surround mechanisms. 

The low-pass image at level s = 7 is retained and will form the basis for im- 
age reconstruction. In the final step of the forward model, pixels in this low- 
pass image are adapted to a linear combination of themselves and the mean value 
cae Mo, Sey) of the low-pass image, as follows. 


LIE Cx, y) = LE, y)G (1 — ADE + ALG, y)) 
ME (x, y) = M(x, y)G (a- AM + AMI (x, y)) 
SH (x, y) = SH (x, y)G (A — A) SH + ASH" (x, y)) 


The amount of dynamic range reduction is determined by user parameter A in these 
equations, which takes a value between 0 and 1. The effect of this parameter on the 
appearance of tone-mapped images is shown in Figure 7.35. 


4 A receptive field may be seen as the pattern of light that needs to be present to optimally stimulate a cell in the visual 
pathway. 
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Using the multiscale observer model, the interpolation parameter A was set to 0, 
0.25, 0.50, 0.75, and 1. 
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The forward version of the multiscale observer model is based on the human vi- 
sual system. Although we could display the result of the forward model, the viewer's 
visual system would then also apply a similar forward model (to the extent that this 
model is a correct reflection of the human visual system). To avoid applying the 
model twice, the computational model should be reversed before an image is dis- 
played. During the reversal process, parameters pertaining to the display device are 
inserted in the model so that the result is ready for display. 

In the first step of the inverse model, the mean luminance La mean of the target 
display device needs to be determined. For a typical display device, this value may 
be set to about 50 cd/m’. A gain control factor for the mean display luminance is 
determined, and the low-pass image is adapted once more, but now for the mean 
display luminance, as follows. 


jibe x, 
Pegga 2) 


7 G(Lamean) 
MH! (x y) 

Mb (x, ) = 7 h 
i : G(Ma, mean) 
Sblur (x y) 

S(x, ) = 7 d 
BN) 


The stack of DoGs is then added to the adapted low-pass image one scale at a time, 
starting with s = 6 and followed by s = 5, 4, ..., 0, as follows. 


L(x, y) 
blur o) 
G(L7™ (x, y)) 
Mp% (x, y) 
blur .0) 
G(M7™ (x, y)) 


Se ey) ) 
blur 4 0 
G(S7" (x, y)) 


LI! (x, y)= max( Li, y) + 
M! (x, y)= max( mj, y)+ 


SHE, y)= maxx, y)+ 


Finally, the result is converted to XYZ and then to RGB, where gamma correction 
is applied. The original formulation of this model shows haloing artifacts similar 
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to those of other local operators discussed in this chapter. One of the reasons for 
this is that the model is calibrated in degrees of visual angle rather than in pixels. 
The transformation between degrees of visual angle to pixels requires assumptions 
on the size of the display and its resolution, as well as the distance between the 
observer and the display. The size of the filter kernel used to create the low-pass 
images is directly affected by these assumptions. For the purpose of demonstration, 
Figure 7.36 shows a sequence of images produced with different kernel sizes. Note 
that we only adjust the size of the smallest Gaussian. By specifying the kernel size 
for the smallest Gaussian, the size of all other Gaussians is determined. The figure 
shows that smaller Gaussians produce smaller halos, which are less obtrusive than 
the larger halos of the original model. 

The reconstruction of a displayable image proceeds by successively adding band- 
pass images back to the low-pass image. These bandpass images by default receive 
equal weight. It may be beneficial to weight bandpass images such that higher spa- 
tial frequencies contribute more to the final result. Although the original multiscale 
observer model does not feature such a weighting scheme, we have found that con- 
trast in the final result may be improved if higher frequencies are given a larger 
weight. This is shown in Figure 7.37, where each successive image places more 
emphasis on higher frequencies. The scale factor k used for these images relates to 
the index number s of the bandpass pyramid in the following manner. 


k=(6-s)g 


The constant g is a user parameter, which we vary between 1 and 5 in Figure 7.37. 
A larger value for g produces more contrast in the tone-mapped image, but if this 
value is chosen too large the residual halos present in the image are emphasized 
(which is generally undesirable). For uncalibrated images tone mapped with the 
multiscale observer model, different prescale factors cause the overall image appear- 
ance to be lighter or darker, as shown in Figure 7.38. 

The computational complexity of this operator remains high, and we would only 
recommend this model for images with an extreme dynamic range. If the amount 
of compression required for a particular image is less, simpler models likely suffice. 
The Fourier transforms used to compute the low-pass images are the main factor 
determining running time. There are seven levels in the Gaussian pyramid, and 
four color channels in the original model, resulting in 28 low-pass filtered images. 
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Using the multiscale observer model, the filter kernel size is set to 0.03, 0.06, 
0.12, 0.25, and 0.5 in this sequence of images. 
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Relative scaling in the multiscale observer model. For a filter kernel size of 0.03, 


the relative scaling parameter was set to 1, 2, 3, 4, and 5. 
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Effect of pre-scaling on the multiscale observer model. Images are pre-scaled by 
factors of 0.001, 0.01, 0.1, 1, and 10. 
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In our simplified model, we only compute three color channels, resulting in a total 
of 21 low-pass images. 

The multiscale observer model is the first to introduce center-surround process- 
ing to the field of tone reproduction, which is also successfully employed in 
Ashikhmin’s operator (see following section) and in Reinhard et al’s photographic 
tone-reproduction operator (see Section 7.3.6). The halos present in the original 
model may be minimized by carefully choosing an appropriate filter kernel size. 


7.3.5 ASHIKHMIN SPATIALLY VARIANT OPERATOR 


The multiscale observer model aims at completeness in the sense that all steps of 
human visual processing that are currently understood well enough to be modeled 
are present in this model. It may therefore account for a wide variety of appearance 
effects. One may argue that such completeness is not strictly necessary for the more 
limited task of dynamic-range reduction. 

Ashikhmin’s operator attempts to model only those aspects of human visual per- 
ception that are relevant to dynamic-range compression [6]. This results in a signif- 
icantly simpler computational model consisting of three steps. First, for each point 
in the image a local adaptation value Lwa(x, y) is established. Next, a compres- 
sive function is applied to reduce the image’s dynamic range. As this step may cause 
some detail to be lost, a final pass reintroduces detail. Ashikhmin’s operator is aimed 
at preserving local contrast, which is defined as 


Lwy) 4 


w Y= 5 x y) 


In this definition, Lwa is the world adaptation level for pixel (x, y). The conse- 
quence of local contrast preservation is that visible display contrast cq(x, y), which 
is a function of display luminance La(x, y) and its derived local display adaptation 
level Laa(x, y), equals cy (x, y). This equality may be used to derive a function for 
computing display luminances, as follows. 


ca(x, y) = Cwl, y) 
ha a a 
Laa(x, y) Lwalx, y) 
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Lw(x, y) 
Lal, y) = La, y) 
O Lwal%, y) 
The unknown in these equations is the local display adaptation value La,(x, y). 
Ashikhmin proposes to compute this value for each pixel from the world adaptation 
values. Thus, the display adaptation luminances are tone-mapped versions of the 
world adaptation luminances, as follows. 


Laa(x, y) = FL y)) 
The complete tone-reproduction operator is then given by 


DEE na YQ, 
Lwa(x, y) 

There are now two subproblems to be solved. The functional form of the tone- 
mapping function F() needs to be given, and an appropriate local world adaptation 
level Lwa(x, y) needs to be computed. 

To derive the compressive function F(), Ashikhmin introduces the notion of 
perceptual capacity of a range of luminance values. Human sensitivity to luminance 
changes is given by TVI functions (see also Chapter 6). This may be used as a scaling 
factor for a small range of luminance values AL. The intuition behind this approach 
is that the perceptual importance of a JND is independent of the absolute luminance 
value for which it is computed. For a range of world luminances between 0 and L, 
perceptual capacity C(L) may therefore be defined as follows. 


L dx 
C E — 
ro i T(x) 


Here, T(x) is the threshold versus intensity function. The perceptual capacity for an 
arbitrary luminance range from Lj to L2 is then C(L2) — C(L1). Following others, 
the TVI function is approximated by four linear segments (in log-log space), and 
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thus the perceptual capacity function becomes 


L/0.0014 for L < 0.0034 
CL) = | 24483 + logio(L/0.0034)/0.4027 for 0.0034 < L < 1 
(L) = ) 16.5630 + (L — 1)/0.4027 for 1 < L < 7.2444 


32.0693 + log; 9(L/7.2444)/0.0556 otherwise. 


World adaptation luminances may now be mapped to display adaptation luminances 
such that perceptual world capacity is linearly mapped to a displayable range. As- 
suming the maximum displayable luminance is given by La max, the compressive 
function F(Lwa(x, y)) is given by 


C(Lwa(x, y)) — C(Lw,min) 
C(Lw, max) = C(Lw, min) f 


In this equation, Lw,min and Lw,max are the minimum and maximum world adapta- 


F (Lwil, y)) = Lajmax 


tion luminances. Finally, the spatially variant world adaptation luminances are com- 
puted in a manner akin to Reinhard’s dodge-and-burn operator, discussed in the 
following section. The world adaptation luminance of a pixel is a Gaussian weighted 
average of pixel values taken over some neighborhood. The success of this method 
lies in the fact that the neighborhood should be chosen such that the spatial extent 
of the Gaussian filter does not cross any major luminance steps. As such, for each 
pixel its neighborhood should be chosen as large as possible without crossing sharp 
luminance gradients. 

To compute if a pixel neighborhood contains any large gradients, consider a 
pixel of a Gaussian-filtered image with a filter kernel R of size s, as well as the 
same pixel position of a Gaussian-filtered image with a kernel of size 2s. Because 
Gaussian filtering amounts to computing a weighted local average, the two blurred 
pixels represent local averages of two differently sized neighborhoods. If these two 
averages are similar, no sharp gradients occurred in the pixel’s neighborhood. In 
other words, if the difference of these two Gaussian-filtered pixels is close to 0 the 
pixel’s neighborhood of size 2s is LDR. The difference of Gaussians is normalized 
by one of the Gaussian-filtered images, yielding a measure of band-limited local 
contrast V;, as follows. 


_ Ly ® Rs — Lw 8 Ros 
Ly @ Rs 


s = 
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These arguments are valid for any scale s. We may therefore compute a stack of 
band-limited local contrasts for different scales s. The smallest scale is $s = 1, and 
each successive scale in Ashikhmin’s operator is 1 pixel larger than the previous. 
The largest scale is 10 pixels wide. 

Each successive larger scale difference of Gaussians tests a larger pixel neigh- 
borhood. For each pixel, the smallest scale s; for which Vs, (x, y) exceeds a user- 
specified threshold ż is chosen. By default, the value of this threshold may be chosen 
to be t = 0.5. The choice of threshold has an impact on the visual quality of the 
operator. If a value of 0.0 is chosen, Ashikhmin’s operator defaults to a global oper- 
ator. If the threshold value is chosen too large, halo artifacts will result. The size of 
these halos is limited to 10 pixels around any bright features because this is the size 
of the largest center. To demonstrate the effect of this threshold, we have reduced an 
image in size prior to tone mapping and enlarged the tone-mapped result, which is 
shown in Figure 7.39. 

The size of a locally uniform neighborhood is now given by s;. The local world 
adaptation value Lwa(x, y) is a Gaussian-blurred pixel at scale s;, as follows. 


Loa, y) = (Lw ® Rs, ) (x, y) 


Note that the scale s; will be different for each pixel so that the size of the local 
neighborhood over which Lwa is computed varies according to image content. The 


Effect of thresholding on results obtained with Ashikhmin’s operator. From left 
to right: threshold values are 0.0, 0.5, and 1.0. 
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idea of using the largest possible filter kernel without crossing large contrast steps 
is in some sense equivalent to the output of edge-preserving smoothing operators 
such as the bilateral and trilateral filters discussed in Sections 8.1.2 and 8.1.3. 

Other than the threshold value discussed previously, this operator does not have 
any user parameters, which is good if plausible results need to be obtained auto- 
matically. However, as with several other operators the input needs to be specified in 
appropriate SI units. If the image is in arbitrary units, it needs to be pre-scaled. The 
effect of pre-scaling an image is shown in Figure 7.40. We note that there appears 
to be a discontinuity in visual appearance between the image that was pre-scaled by 
a factor of 10 (top right) and 100 (bottom left in Figure 7.40). We suspect that this 
is due to the Cı discontinuity in the TVI function used in the perceptual capacity 
function. The C; discontinuity in the TVI function is due to the different luminance 
levels at which rods and cones in the human visual system operate. 

In summary, Ashikhmin’s operator is based on sufficient knowledge of the hu- 
man visual system to be effective without aiming for completeness. The operator is 
not developed to be predictive but to provide a reasonable hands-off approach to 
producing visually pleasing output in which local contrast is preserved. 


7.3.6 REINHARD ET AL. PHOTOGRAPHIC TONE 
REPRODUCTION 


The problem of mapping a range of world luminances to a smaller range of display 
luminances is not a new problem. Tone reproduction has existed in conventional 
photography since photography was invented. The goal of photographers is often 
to produce renderings of captured scenes that appear realistic. With photographic 
paper (like all paper) being inherently LDR, photographers have to find ways to 
work around the limitations of the medium. 

Although many common photographic principles were developed in the last 150 
years, and a host of media response characteristics were measured, a disconnect 
existed between the artistic and technical sides of photography. Ansel Adams’ zone 
system, which is still in use today, attempts to bridge this gap. It allows the photo- 
grapher to use field measurements to improve the chances of creating a good final 
print. 


306 CHAPTER 07. SPATIAL TONE REPRODUCTION 


3 Pre-scaling applied to Ashikhmin’s operator with factors ranging from 0.1 to 


10,000. Each successive image was scaled by a factor 10 times the factor of the preceding image. 
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The zone system may be used to make informed choices in the design of a tone- 
reproduction operator [109]. First, a linear scaling is applied to the image, which 
is analogous to setting exposure in a camera. Then, contrast may be locally adjusted 
using a computational model akin to photographic dodging and burning, which is 
a technique to selectively expose regions of a print for longer or shorter periods of 
time. This may bring up selected dark regions, or bring down selected light regions. 

The key of a scene in photography is an indicator of how light or dark the overall 
impression of a scene is. Following other tone-reproduction operators, Reinhard et 
al. view the log average luminance Ly (Equation 7.1) as a useful approximation of 
a scene’s key. For average-key scenes, the log average luminance should be mapped 
to 18% of the display range, which is in line with common photographic practice 
(although see footnote 4 of Chapter 2). Higher-key scenes should be mapped to a 
higher value, and lower-key scenes should be mapped to a lower value. The value to 
which the log average is mapped is given as a user parameter a. The initial scaling 
of the photographic tone-reproduction operator is then given by the following. 

a 


Lm(x, y= L Lw(&, y) 


WwW 


The subscript m denotes values obtained after the initial linear mapping. In that 
this scaling precedes any nonlinear compression, the operator does not necessarily 
expect the input to be specified in SI units. If the image is given in arbitrary units, 
the user parameter a could be adjusted accordingly. An example of this parameter’s 
effect is shown in Figure 7.41. For applications that require hands-off operation, 
the value of this user parameter may be estimated from the histogram of the im- 
age [106]. This technique is detailed in Section 7.1.1. 

Many scenes have a predominantly average dynamic range with a few high- 
luminance regions near highlights or in the sky. Traditional photography uses 
S-shaped transfer functions (sigmoids) to compress both high- and low-luminance 
values while emphasizing the midrange. However, modern photography uses trans- 
fer functions that predominantly compress high luminances. This may be modeled 
with the following compressive function. 


Lin, y) 


La(x, y) = —= 
WENS TE e 
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FIGURE 7.44 Pre-scaling the image data is an integral part of the photographic 
tone-reproduction operator and may be automated. Here, user parameter a was set to 0.01, 0.04, 
0.18 (default), and 0.72. 
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This function scales small values linearly, whereas higher luminances are com- 
pressed by larger amounts. The function has an asymptote at 1, which means that 
all positive values will be mapped to a display range between 0 and 1. However, 
in practice the input image does not contain infinitely large luminance values, and 
therefore the largest display luminances do not quite reach 1. In addition, it may be 
artistically desirable to let bright areas burn out in a controlled fashion. This effect 
may be achieved by blending the previous transfer function with a linear mapping, 
yielding the following tone-reproduction operator: 


L(x, y)(1+ a! 


L , = white 
ee 1+ L(x, y) 


This equation introduces a new user parameter, Lwhite, Which denotes the smallest 
luminance value that will be mapped to white. By default, this parameter is set 
to the maximum world luminance (after the initial scaling). For lower-dynamic- 
range images, setting Lmax to a smaller value yields a subtle contrast enhancement. 
Figure 7.42 shows various choices of Lwhite for an LDR image. Note that for hands- 
off operation this parameter may also be estimated from the histogram of the input 
image [106]. 

The previous equation is a reasonable global tone-reproduction operator. How- 
ever, it may be modified to become a local tone-reproduction operator by applying 
an algorithm akin to photographic dodging and burning. In traditional dodging and 
burning, the area that receives a different exposure from the remainder of the print 
is bounded by sharp contrasts. This is a key observation that should be reproduced 
by any automatic dodge-and-burn algorithm. 

For each pixel, we would therefore like to find the largest surrounding area 
that does not contain any sharp contrasts. A reasonable measure of contrast for 
this purpose is afforded by traditional center-surround computations. A Gaussian- 
weighted average is computed for a pixel (the center), and is compared with a 
Gaussian-weighted average over a larger region (the surround), both centered over 
the same pixel. If there are no significant contrasts in the pixel’s neighborhood, the 
difference of these two Gaussians will be close to 0. However, if there is a contrast 
edge that overlaps the surround but not the center Gaussian the two averages will 
be significantly different. 
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The Lwhite parameter in the Reinhard photographic tone-reproduction operator 
is effective in minimizing the loss of contrast when tone mapping a low dynamic range image. The 
value of Lwhite was set to 0.15 in the top left-hand image, and incremented by 0.10 for each 
subsequent image. 


If a Gaussian-blurred image at scale s is given by 
Le (x, y) = Lm (Xt, y) 8 Re (x, y), 


the center-surround mechanism at that scale is computed with 


V; (x, y) = 2% q/s2 + Leur Lblur . 
AY 


The normalization by 2?a/s? + po allows this result to be thresholded by a 
common threshold that is shared by all scales, in that V; is now independent of 
absolute luminance values. In addition, the 2° a/s” term prevents the normalization 
from breaking for small values of L, The user parameter ® may be viewed as a 
sharpening parameter, the effect of which is shown in Figure 7.43. For small values 
of ®, its effect is very subtle. If the value is chosen too large, haloing artifacts may 
occur. In practice, a setting of ® = 8 yields plausible results. 

This process yields a set of differences of Gaussians, each providing information 
about how much contrast is available within increasingly large areas around the 
pixel of interest. To find the largest area that has relatively low contrast for a given 
pixel, we seek the largest scale Smax for which the difference of Gaussians remains 
below a threshold, as follows. 


Smax : |Vsma > y)| =e 


For this scale, the corresponding center Gaussian may be taken as a local average. 
The local operator that implements a computational model of dodging and burning 
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The sharpening parameter ® in the photographic tone-mapping operator is cho- 
sen to be 4 and 8 (top row) and 16 and 32 (bottom row). 


is then given by the following. 


Ln, y) 
1 + Lon, y) 


The luminance of a dark pixel in a relatively bright region will satisfy L < Le 


and thus this operator will decrease the display luminance Lg, thereby increasing 
the contrast at that pixel. This is akin to photographic “dodging.” Similarly, a pixel 
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in a relatively dark region will be compressed less, and thus “burned.” In either case 
the pixel’s contrast relative to the surrounding area is increased. 

The memory efficiency of the dodge-and-burn version may be increased by re- 
alizing that the scale selection mechanism could be executed on the fly. The original 
implementation computes a Gaussian pyramid as a preprocess. Then, during tone 
mapping for each pixel the most appropriate scale is chosen. Goodnight et al. show 
that the preprocessing step may be merged with the actual tone-reproduction stage 
and thus avoid computing the low-pass images that will not be used [43]. Their 
work also shows how this operator may be implemented in graphics hardware. 

In summary, the photographic tone-reproduction technique [109] exists in both 
global and local variants. For medium-dynamic-range images, the global operator 
is fast and provides sufficient compression. For very high-dynamic-range images, 
local contrast may be preserved better with the local version that implements dodg- 
ing and burning. The local operator seeks for each pixel the largest area that does 
not contain significant contrast steps. This technique is therefore similar to edge- 
preserving smoothing filters such as the bilateral filter discussed in Section 8.1.2. 
We could therefore replace the scale selection mechanism with the more practi- 
cal and efficient bilateral filter to produce a spatially localized average. This average 
would then serve the purpose of finding the average exposure level to which the 
pixel will be adjusted. 


7.3.7 PATTANAIK ADAPTIVE GAIN CONTROL 


Thus far, we have discussed several tone-reproduction operators that compute a lo- 
cal average. The photographic tone-reproduction operator uses a scale-space mech- 
anism to select how large a local area should be and computes a weighted average 
for this local area. It is then used to adjust exposure level. Ashikhmin’s operator 
does the same, but provides an alternative explanation in terms of human vision. 
Similarly, the bilateral filter is effectively an edge-preserving smoothing operator. 
Smoothing by itself can be viewed as computing an average over a local neighbor- 
hood. The edge-preserving properties of the bilateral filter are important, because 
it allows the space over which the average is computed to be maximized. 

The defining characteristic of the bilateral filter is that pixels are averaged 
over local neighborhoods, provided their intensities are similar. The bilateral filter 
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is defined as 


1 
w(x, y) 


w(x, y) = Yobe, y, u,v) 


bx, yu, v) = f(y —w? + -v )e( La- u, y—v) La, y), 


Pema (x, y) = D> > bG, yu, v)L@ — u, y — v) (7.7) 
u v 


with w() a weight factor normalizing the result and b() the bilateral filter consist- 
ing of components f () and g(). There is freedom to choose the shape of the spatial 
filter kernel f (), as well as the luminance-domain filter kernel g(). Different solu- 
tions were independently developed in the form of SUSAN [118] and the bilateral 
filter [128]. At the same time, independent and concurrent developments led to 
alternative tone-reproduction operators: one based on the bilateral filter [23] and 
one based on the SUSAN filter [96]. 

Whereas Durand and Dorsey experimented with Gaussian filters and Tukey’s fil- 
ter, Pattanaik and Yee employed a near box-shaped filter kernel in the luminance do- 
main to steer the amount of compression in their tone-reproduction operator [96]. 
The latter used the output of their version of the bilateral filter as a local adapting 
luminance value, rather than as a mechanism to separate the image into a base layer 
and a detail layer as Durand and Dorsey did. 

Taking their cue from photography, Pattanaik and Yee note that white tends to 
be five times as intense as medium gray and black is one-fifth the luminance of 
medium gray. Their local gain control is derived from a weighted local average in 
which each surrounding pixel is weighted according to its luminance in relation 
to the luminance of the pixel of interest. Pixels more than five times as intense as 
the center pixel, and pixels less than one-fifth its luminance, are excluded from 
consideration. For a circularly symmetric area around pixel (x, y), the local average 
is then computed for all pixels as follows. 


1 z Lw(x — u, y-— v) 2 
5 Lw(x, y) 7 


5 
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The circularly symmetric local area is determined by bounding the value of u and 
v by the radius r of the area under consideration, as follows. 


u2+u2<r 


An alternative notation for the same luminance-domain constraint may be formu- 
lated in the log domain, with the base of the log being 5, as follows. 


logs (Lw(x — u, y — v)) — logs(Lw(x, y))| <1 


This implies a box filter in the luminance domain and a “box filter” (albeit circularly 
symmetric) in the spatial domain. A box filter in the luminance domain suffices if 
the image consists solely of sharp edges. Smoother high-contrast edges are best 
filtered with a luminance-domain filter that has a somewhat less abrupt cutoff. This 
may be achieved with the following luminance-domain filter kernel g(). 


g(x — u, y — v) = exp(—|logs(Lw(x — u, y — v)) — logs(Lw(x, y))|*) 


The spatial filter kernel f () is circularly symmetric and unweighted, as follows. 


1 ifVu2+v2<r 


0 otherwise 


fæ-uny-V=| 


The result of producing a filtered image with this filter is an image that is blurred, 
except in areas where large-contrast steps occur. This filter may therefore be viewed 
as an edge-preserving smoothing filter, as are the bilateral and trilateral filters. The 
output of this filter may therefore be used in a manner similar to tone-reproduction 
operators that split an image into a base layer and a detail layer. The base layer is 
then compressed and recombined with the detail layer under the assumption that 
the base layer is HDR and the detail layer is LDR. 

Alternatively, the output of this filter may be viewed as a local adapting lumi- 
nance. Any of the global operators that make use of a global average may thus be 
extended to become local operators. For instance, the output of any edge-preserving 
smoothing operator, as well as the scale selection mechanism of Reinhard et al.'s 
photographic operator, may serve as a local adaptation luminance. In each case, 
the typical trade-off between amount of achievable compression and visibility of 
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haloing artifacts will return. However, by using edge-preserving smoothing opera- 
tors or the aforementioned scale selection mechanism, the local average is ensured 
to be relatively close to the pixel value itself. Although halos may not be avoided 
altogether, they are minimized with these approaches. 


7.3.8 YEE SEGMENTATION-BASED APPROACH 


Many HDR images contain large areas that are relatively dark and large areas that are 
bright. An often-quoted example of such a configuration is a room with a window. 
In such cases, it may be desirable to apply different compression functions for the 
bright and dark regions. 

Any algorithm that uses a local adaptation level—such as the semisaturation 
constant in the Michaelis-Menten equations (6.1) —may be modified to explicitly 
use an adaptation level based on segmentation of the bright and dark areas into 
separate regions. 

At least two operators are currently known that segment an image into separate 
regions for the purpose of tone reproduction [67,150]. In this section, we discuss 
Yee and Pattanaik’s approach. They effectively segment the image into separate re- 
gions, and then determine a suitable adaptation level for each region [150]. Their 
approach consists of the four following steps. 


1 Segmentation: Based on the histogram of a density representation, the image is 
segmented into regions. A histogram is created with a specific number of 
bins (as discussed in material following). 

2 Grouping: Pixels in the segmented image are grouped, and each pixel within a 
group is assigned the average density of the group. 

3 Assimilation: Small groups and groups with only one neighbor are merged. 
The result of the assimilation process is called a layer. 

4 Layer averaging: The previous three steps are repeated several times for his- 
tograms with different bin sizes (and numbers of bins), and for each pixel 
the results are averaged. 


After layer averaging is complete, the resulting image provides a local adaptation 
level (in the log domain) for each pixel. Several user parameters are introduced to 
steer the quality of the results. The layer-averaging step has the effect of smoothing 
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FIGURE 7.44 This image is segmented into 10 bins, and then encoded with a separate gray level 
per bin. 


the adaptation level image. This image should not contain sharp discontinuities, in 
that such discontinuities would lead to artifacts in the final tone-mapped result. By 
choosing more layers, each created with a histogram with a different spacing of 
bins, a smoother result is obtained. Hence, the total number of layers is an impor- 
tant parameter, trading computation time for visual quality. The number of layers 
required to minimize artifacts depends on the composition of the image and on its 
dynamic range. The bin size B, is determined by the total number of layers N, the 
current layer number n, and two further user parameters that limit the minimum 
and maximum bin size (Bmin and Bmax), as follows. 


n 
Bn = Bmin + (Bmax — Bmin) T 
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This image is segmented into 10 bins, grouped, and then encoded with a separate 
gray level per group. 


The minimum and maximum bin size are by default set to 0.5 and 1.0, respec- 
tively. Given the bin size B, for the current layer, each pixel may be categorized as 
belonging to bin b, as follows. 


D(x, y) — Dmin 


bx, y= F 
n 


An example is shown in Figure 7.44, where each gray level indicates a separate 
bin. Once each pixel is labeled with its bin number b, pixels may be grouped. An 
image after grouping is shown in Figure 7.45. During the grouping process, the 
average density of the group is determined and stored. The grouping makes use of 
a recursive flood-fill algorithm. A potential problem with this approach is that if 
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Layer averaging demonstrated with a total of 10 layers. 


large areas are filled recursively the number of recursive calls may cause the system 
to run out of stack space. The flood-fill algorithm also keeps track of how many 
pixels are added to a group. After each group is filled, the average density of the 
group is computed. 

The assimilation process merges small groups with larger ones. For details, we 
refer to the original paper [150]. For the images shown in this section, we have 
omitted this step. It is possible that for certain image compositions the assimilation 
step produces an improved estimate of local adaptation levels, but we have found 
that for the test images used here (in combination with the sigmoidal compression 
function we used) the results without the assimilation step are very good. 

In our implementation, the average density of a group is used in the layer- 
averaging process. Because for each pixel the group number is known, the average 
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Final result of tone mapping with Yee and Pattanaik’s method for deriving local 
adaptation levels. This image was created with layer averaging over 10 layers to allow comparison 
with Figure 7.46. 


density assigned to each pixel is found by computing the following. 
Dy a(x, y) = group_list[group[y][x]].]um_av 


The previous steps are repeated for all layer numbers 0,..., N, and the results are 
averaged, as follows. 


1 N-1 
Lix, y)= exo( 5 > Dh ay (X, ») 


n=0 
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Image tone mapped with a global tone-reproduction operator (left), and with a 


replacement of the global adaptation level for local adaptation levels based on segmentation (right). 


The resulting adaptation levels are shown in Figure 7.46. The final result for a total 
of 10 layers is shown in Figure 7.47. For this image, 10 layers are not quite sufficient 
for an artifact-free result. Rather, this choice allows a direct comparison between 
Figures 7.46 and 7.47. 

To demonstrate the effect of this approach in deriving the local adaptation lumi- 
nance for each pixel, we adapted Reinhard and Devlin’s photoreceptor-based algo- 
rithm to accept the previously cited local averages. Figure 7.48 shows the result of 
this operator with a global adaptation level (left) and locally computed adaptation 
levels obtained with the previous segmentation procedure (right). 

The effect of varying the number of layers on the quality of the results is shown 
in Figure 7.49, where the number of layers was varied between 5 and 30. It is clear 
that the smoothing effect of averaging multiple layers is important in avoiding visual 
artifacts. The number of layers required varies with the dynamic range of the image, 
as well as the composition of the image. For this particular example, 30 layers are 
sufficient. 

In summary, Yee and Pattanaik propose to segment the image into regions 
and compute an adaptation level for each region. By smoothing the results — 
accomplished by repeating the segmentation for different histogram bin sizes — local 
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The number of layers computed with Yee and Pattanaik’s segmentation approach 
is varied between 5 and 30 (with increments of five layers) in this image sequence. If this number 
is too small, artifacts occur. For this example, artifacts are removed completely when 30 layers are 
used. 


adaptation levels are computed suitable for steering local tone-reproduction opera- 
tors. The usefulness of this approach is demonstrated in this section by augmenting 
the photoreceptor-based operator with these local adaptation levels. 


7.4 SUMMARY 


Tone-reproduction operators may reduce the dynamic range of images by applying 
the same function to all pixels, or they may compress pixels based on their value and 
the values of a local neighborhood of pixels. The former category is computationally 
efficient and generally suitable for medium-dynamic-range images. 

Extra compression may be achieved by making the compressive function depen- 
dent on neighboring pixels. This may be achieved by dividing the input image by 
a blurred version of the image, in which case the amount of blur to apply should 
be large to avoid haloing artifacts. A blurred version of the image may also be seen 
as an adaptation level. Then it can be used as the semisaturation constant in a sig- 
moidal (or S-shaped) function. In this case, artifacts are minimized by choosing a 
small filter kernel— typically only a few pixels wide. Small filter kernels have the 
added advantage of low computational cost. 

In either case, artifacts may be minimized by using filter kernels that do not cross 
sharp image contrasts. Edge-preserving smoothing operators such as the bilateral fil- 
ter, as well as the scale selection mechanism employed by the photographic operator, 
are examples of techniques that avoid blurring across stark contrasts and therefore 
show fewer artifacts than operators that blur each pixel by the same amount. 


Frequency Domain and 
Gradient Domain Tone 
Reproduction 


In addition to local and global operators, 
there are two other classes of operators 
that work in a fundamentally different way. 
First, it may be possible under favorable 
conditions to separate illuminance from 
surface reflectance. By compressing only 
the illuminance component, an image may 
be successfully reduced in dynamic range. 
Second, we may exploit the fact that an im- 
age area with a high dynamic range tends to exhibit large gradients between neigh- 
boring pixels. This leads to a solution whereby the image is differentiated. Then the 
gradients are manipulated before the result is integrated into a compressed image. 
Frequency-dependent operators are interesting from a historical perspective as 
well as for the observations about image structure they afford. These algorithms 
may therefore help us better understand the challenges we face when preparing 
HDR images for display. The following also explores gradient domain operators, in 
that they are algorithmically related to frequency domain operators. 


8.1 FREQUENCY DOMAIN OPERATORS 


Tone reproduction and dynamic range reduction are generally thought of as fairly 
recent developments. The problem was introduced to the field of lighting design in 
1984 [186], and to the computer graphics community in 1993 [130,131]. How- 
ever, HDR images and the problem of dynamic-range reduction are as old as the field 
of photography, for which the printing process may be seen as a tone-mapping tech- 
nique. The problem also surfaced again with the invention of digital images. The 
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first digital images were scanned with a bit depth of 12 bits, but could only be 
displayed with a bit depth of 8 bits. As a result, the first digital images had to be 
tone mapped prior to display. To our knowledge, the first digital tone-reproduction 
operator was published in 1968 by Oppenheim and colleagues [91]. 

Although this operator appears to be largely forgotten, the work itself contains 
several key ideas (including homomorphic filtering) that have found their way 
into numerous other tone-reproduction operators. In addition, the resulting tone- 
reproduction operator produces visually appealing output for a variety of images 
and perhaps deserves more attention than it currently receives. 

Oppenheim’s operator is a frequency-dependent compressor, in which low fre- 
quencies are attenuated more than higher frequencies. This approach was recently 
also taken by a technique called bilateral filtering [23] (Section 8.1.2). This term refers 
to an edge-preserving smoothing technique that forms the basis for various image- 
processing tasks, including tone reproduction. The bilateral filtering technique is 
used to separate an image into a base layer and a detail layer. The base layer tends 
to be low frequency and HDR, whereas the detail layer is high frequency and LDR. 
The tone-reproduction operator then proceeds by compressing the base layer before 
recombining it with the detail layer. At the same time, the output of the bilateral 
filter may be seen as providing a local adaptation value for each pixel, and therefore 
classification of this algorithm as a local operator would have been equally valid. 

A similar separation into base and detail layers may be achieved with the trilateral 
filter [10], which is an extension of the bilateral filter. The difference between this 
and the bilateral filter lies in the technique used to separate the image into two 
layers. 

All three techniques, however, apply a compression scheme that is frequency 
dependent, and thus they are grouped in this chapter. In the following subsections, 
each of these techniques is presented in more detail. 


8.1.1 OPPENHEIM FREQUENCY-BASED OPERATOR 


Under simplified assumptions, such as a scene consisting of diffuse objects only 
and no directly visible light sources, image formation may be thought of as the 
product of illuminance and reflectance. As indicated in Section 7.1.3, the illumi- 
nance component is then HDR, whereas the reflectance component is not. It would 
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therefore be advantageous if we could separate the two components, and perform 
dynamic-range compression on the illuminance component only. 

This approach implicitly assumes that the surfaces in a scene are diffuse. This is 
to an approximation true for many objects, but the method ignores high-frequency 
HDR phenomena such as specular reflections, caustics, and directly visible light 
sources. We would therefore not recommend this approach for images depicting 
these types of lighting. 

For the remaining class of images, separation of reflectance and illuminance is 
to some degree possible by observing that illumination varies slowly over the im- 
age, whereas reflection is sometimes static and sometimes dynamic [91]. This is 
because objects tend to have well-defined edges and vary in size and texture. As 
such, partially independent processing of illuminance and reflectance is possible in 
the frequency domain. 

Oppenheim et al. therefore suggest applying a whitening filter to the density rep- 
resentation of an image, which attenuates low frequencies while preserving higher 
frequencies. This is based on the observation that density representations of images 
tend to show a sharp peak in the low frequencies, with a plateau for medium and 
high frequencies. 

As an aside, whitening is the process in which the amplitude of the Fourier 
representation is altered such that all frequencies carry an equal amount of energy. 
This is generally achieved by amplifying higher frequencies. The opposite approach, 
in which higher frequencies are attenuated, has the effect of blurring the image. 
These two effects are demonstrated in Figure 8.1. 

Frequency-sensitive attenuation of an image thus starts by taking the logarithm 
of each pixel to compute densities. Then the FFT is computed on the density rep- 
resentation so that low frequencies may be attenuated more than high frequencies. 
The inverse Fourier transform is then applied to return to a density representa- 
tion. In turn, the density image is exponentiated to yield a displayable image. For 
a Fourier-transformed density image, we experimented with the following attenua- 
tion function. 

kf 
1+kf 
Both amplitude and phase spectra are multiplied by this scaling, which depends on 
frequency f and takes two user parameters c and k. The user parameter c controls 


s(f)=(1—e)+e 
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FIGURE 5.4 The image in the middle was blurred by attenuating higher frequencies (top left) 
and whitened by amplifying higher frequencies (bottom). (Image courtesy of the Albin Polasek 
Museum, Winter Park, Florida.) 


the maximum amount of attenuation applied to the zero-frequency DC (direct cur- 
rent) component, whereas the k user parameter determines how rapidly the slope 
reaches the plateau of 1. 

A reasonable default value for c is 0.5, as recommended by Oppenheim et 
al. [91], which generally lies between 0 and 1. The scaling functions s( f) spanned 
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Compression curve as function of parameter c 


1000 


To demonstrate Oppenheim’s operator, the effect of the parameter choice of c on the 
scaling function s( f) is shown. 


by different choices of c are plotted in Figure 8.2. An HDR image compressed with 
different values for c is shown in Figure 8.3. 

The k parameter could be initialized to 0.01, with a sensible range for this pa- 
rameter being [0.001,0.02]. Its impact on the shape of the scaling function s( f) is 
shown in Figure 8.4. Images compressed with different values of k are presented in 
Figure 8.5. For smaller values of k, the plateau at which no attenuation is applied 
occurs for higher frequencies and thus the image is compressed more. The higher 
the value of k the sooner the plateau is reached, and less dynamic-range reduction 
is achieved. 

In summary, Oppenheim et al. were the first to address the dynamic-range re- 
duction problem. They proposed to attenuate low frequencies in the density (log) 
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The effect of different values of c in Oppenheim’s operator. In reading order, c is 
given values of 0.1, 0.3, 0.5, 0.7, and 0.9. 
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Compression curve as function of parameter k 


1000 


Effect of the parameter choice of k on the scaling function s( f) in Oppenheim’s 
operator. 


domain, while preserving higher frequencies. This type of processing is called ho- 
momorphic filtering, which affords partially independent processing of illuminance 
and reflectance. They observed that reflectance is typically high frequency and LDR, 
whereas illuminance produces slow gradients within an arbitrary HDR. For images 
depicting sharp shadow boundaries, participating media, specular highlights, or di- 
rectly visible light sources, this separation may not always be performed cleanly and 
the method may therefore not always yield satisfactory results. 

Aspects of this algorithm — including homomorphic filtering (Section 7.1.3), 
separation of the image into illuminance and reflectance, and the concept of tone 
reproduction — were first introduced in Oppenheim’s work. With a suitable choice 
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The effect of different values of k in Oppenheim’s operator. In reading order, k is 
increased from 0.002 to 0.004, 0.008, and 0.016. 


of the parameters c and k (introduced by us), this algorithm produces reasonable 
output, despite the theoretical restrictions mentioned previously. 


8.1.2 DURAND BILATERAL FILTERING 


The idea that an image may be separated into a high-frequency component that 
contains only LDR information and a low-frequency component with an HDR is 
explicitly exploited by Oppenheim’s operator by attenuating low frequencies in the 
Fourier domain. Separation of an image into separate components whereby only 
one of the components needs to be compressed may also be achieved by applying 
an edge-preserving smoothing operator. 

Durand and Dorsey introduced the bilateral filter to the computer graphics com- 
munity and showed how it may be used to help solve the tone-reproduction prob- 
lem [23]. Bilateral filtering is an edge-preserving smoothing operator that effec- 
tively blurs an image but keeps sharp edges intact. An example is shown in Fig- 
ure 8.6, in which the smoothed image is shown on the right. Edges in this image 
are preserved (compare with the unprocessed image on the left), whereas interior 
regions have reduced detail. This section introduces a tone-reproduction operator 
that uses bilateral filtering and goes by the same name. 

Blurring an image is usually achieved by convolving the image with a Gaussian 
filter kernel. The bilateral filter extends this idea by reducing the weight of the 
Gaussian kernel if the density difference is too large (see Equation 7.7). A second 
Gaussian is applied to density differences. Following Oppenheim, this method op- 
erates on a density image, rather than on linear values. 

The result of this computation, as seen in Figure 8.6, is to some extent analogous 
to the illuminance component as discussed by Oppenheim et al. [91]. From the 
input image and this illuminance image, the reflectance image may be reconstructed 
by dividing the input and illuminance image. The smoothed image is known as the 
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The image on the left was smoothed with a bilateral filter, resulting in the image 


on the right. 


base layer, whereas the result of this division is called the detail layer. Note that the 
base and detail layers do not necessarily split the image into an illuminance and 
reflectance component. This method does not make the implicit assumption that 
the scene depicted is predominantly diffuse. 

Examples of an HDR input image, an HDR base layer, and an LDR detail layer are 
shown in Figure 8.7. In this figure, the bilateral filter is applied to the luminance 
channel only. To reconstruct the base layer in color, we replaced the luminance chan- 
nel of the image (in Yxy color space) with this output, exponentiated the result to 
yield a linear image, and converted to RGB. The detail layer was reconstructed in a 
similar manner. 

After the bilateral filter is applied to construct base and detail layers in the loga- 
rithmic domain, the dynamic range may be reduced by scaling the base layer to a 
user-specified contrast. The two layers are then recombined and the result is expo- 
nentiated and converted to RGB to produce the final displayable result. 

The amount of compression applied to the base layer is user specified, but Du- 
rand and Dorsey note that a target dynamic range of about 5 log units suffices for 
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HDR image tone mapped with bilateral filtering (left). The corresponding base 
layer and detail layers are shown in the right-hand and bottom images. 


many images.* For images that show light sources directly, this value may be ad- 
justed. The effect of this parameter is shown in Figure 8.8, in which the contrast of 
the base layer was varied between 2 log units and 7 log units. 

Bilateral filtering may be implemented directly in image space, but the convolu- 
tion with a spatial Gaussian modulated by a Gaussian in density differences is rel- 


1 We use the natural logarithm in this case. 
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Bilateral filtering results with varying amounts of compression applied to the base 


layer. The dynamic range of the base layer was set between 2 log units (top left-hand image) and 
7 log units (bottom right-hand image). 


atively expensive to compute. In addition, the second Gaussian makes this method 
unsuitable for execution in the Fourier domain. Durand and Dorsey show how these 
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disadvantages may be overcome by splitting the density differences into a number of 
segments [23]. The results are then recombined, yielding an approximate solution 
that in practice is indistinguishable from accurate spatial processing. The computa- 
tion is given by the following. 


1 
psmooth (y, ) = ——_ bj(x, y,u,v) D(x —u,y—v) 
j y kj (x, y) 22 í 
kj(x, y) = X9 bje, y, u, v) 
u v 


bj(x, y, u, v) = fa — u)? + (y- v)?)¢(D(x —u,y—v)—- Dj) 


Here, the values D; form a quantized set of possible values for pixel (x, y). The 
final output for this pixel is a linear combination of the output of the two smoothed 
values Doth and pa, These two values are chosen such that Dj and Dj+1 
are the closest two values to the input density D of pixel (x, y). 


For each segment j, the previous equation may be executed in the Fourier do- 
main, thus gaining speedup. The number of segments depends on the dynamic 
range of the input image, as well as the choice of standard deviation for the Gaussian 
g(), which operates on density differences. A suitable choice for this standard de- 
viation is about 0.4. The computation time of the bilateral filter depends on the 
number of segments. There is therefore a trade-off between computation time and 
visual quality, which may be chosen by specifying this standard deviation. 

We have experimented with different values and show the results in Figure 8.9. 
For this particular image, the choice of standard deviation has a relatively small effect 
on its visual appearance. However, this parameter directly influences the number of 
segments generated, and thus affects the computation time. For this image, the 
largest standard deviation we chose was 8 log units, resulting in the creation of 
two segments. For values close to the default of 0.4, the number of segments is 
much higher due to the image’s high dynamic range. This image was split into 
19 segments for a standard deviation of 0.5, and into 38 segments for a standard 
deviation of 0.25. The computation times recorded for these images are graphed in 
Figure 8.10. 

This computation time is substantially higher than those reported by Durand 
and Dorsey [23], most likely because the dynamic range of this image is higher 
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Bilateral filtering showing results for different choices of standard deviation of the 
Gaussian filter operating on density differences, starting at 0.25 for the top left-hand image and 
doubling for each subsequent image. The bottom right-hand image was therefore created with a 
standard deviation of 8 log units. 


than many of their examples. In this chapter, we use a standard deviation of 0.4 as 
recommended in the original paper, but note that discrepancies in reported com- 
putation times may be due to the choice of images. 


Computation time in seconds 
a 
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Computation time of Durand and Dorsey’s bilateral filter as a function of the 
standard deviation of the Gaussian filter. 
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Further, Durand and Dorsey observed that bilateral filtering aims at low-pass fil- 
tering, and thus for most of the computations the full resolution of the input image 
is not required. It is therefore possible to sample the image using nearest-neighbor 
downsampling, perform bilateral filtering, and then upsample the results to the full 
resolution. This significantly reduces the computational cost of the algorithm for 
downsampling factors of up to 10 to 25. Higher factors will not yield a further re- 
duction in computation time, because upsampling and linear interpolation will then 
start to dominate the computation. The visual difference between no downsampling 
and downsampling within this range is negligible. We therefore downsampled all 
results in this section with a factor of 16. 

In summary, bilateral filtering is a worthwhile technique that achieves a hands- 
off approach to tone reproduction. The method is able to smooth an image without 
blurring across sharp edges. This makes the method robust against outliers and 
other anomalies. The method splits a density image into an HDR and an LDR layer. 
The HDR layer is then compressed and recombined with the other layer. The result 
is exponentiated to form an LDR image. Various techniques are available to speed 
up the process. 


8.1.3 CHOUDHURY TRILATERAL FILTERING 


Although the bilateral filter has attractive features for edge-preserving filtering, 
Choudhury and Tumblin note that this filter also has certain drawbacks. In par- 
ticular, the filter smooths across sharp changes in the gradients of the image, and 
the filter poorly smooths high-gradient and high-curvature regions [10]. 

The trilateral filter aims to overcome these limitations by extending the bilateral 
filter. In fact, two modified versions of the bilateral filter are applied in succes- 
sion. The algorithm starts by computing a density image from a luminance image, 
whereupon image gradients are computed. These gradients are then smoothed and 
used as an indicator of the amount by which the bilateral filter should be tilted to 
adapt to the local region. The smoothing itself is achieved through bilateral filtering. 
Figure 8.11 shows images of the various steps involved in the algorithm. 
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With the filter kernel given by b(x, y, u, v), the bilaterally smoothed tilting vec- 
tor A may be computed for each pixel as follows. 


1 
w(x, y) 


w(x, y)= >) > dG, y, u,v) 


bex, y.u, v) = f(y w+ 0-0) 


x g(I|VDin(x — u, y — v) — Y Da (x, y) 


A(x, y)= 


XO} bG, y, u, v)V Dn — u, y — v) 
u v 


The filter output is normalized by the weight factor w. The gradients of the input 
are computed using forward differences, as follows. 


V Din(m, n) © (Din(m + 1,n) — Din(m,n), Din(m,n + 1) — Din(m, n)) 


If we were to apply a bilateral filter to the input image after tilting the filter by 
A(x, y), its Gaussian constituents f() and g() would no longer be orthogonal. 
Therefore, rather than computing a spatial weight s() for neighboring densities 
D(x —u, y—v) by measuring the spatial distance between (x, y) and (x — u, y—v), 
this distance is now measured through a plane of density values with orientation 
P(x —u, y — v). This orientation is a scalar value that may be computed as follows. 


P(x — u, y — v) = Dn (x, y) + A(x, y)- (u, v)? 


Before computing trilateral output values, P(x — u, y — v) is subtracted from the 
input density values to compute a local detail signal Da (x — u, y — v), as follows. 


Da (x — u, y — v) = Din (x — u, y — v) — P (x — u, y — v) 


The output of the trilateral filter Dsmocth (yy) is then obtained as follows. 


1 
Doty, y) = D(x, y) + ——— b(x, y,u, v)Da(x — u, y — v) 
mLa 
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bix, yu, v)= f(y @ -0+ (y= v)? ) (Dae -u y= o))iA a u, y = 0) 


By tilting the trilateral filter it is possible to smooth more accurately in high-gradient 
regions, but this comes at the cost of a potential for extending the filter window 
beyond local boundaries into regions of dissimilar gradients. This may cause un- 
desirable blurring across sharp ridges and corners where the bilaterally smoothed 
gradient A changes abruptly. 

This problem is solved by the binary function 5,4 introduced in the previous 
equation. This function exploits a feature of the functional shape of the smoothed 
gradient field A to limit the contribution of pixel (x — u, y — v) if it lies across 
a sharp edge. A sharp edge is present if there is a large jump in the magnitude of 
A between (x, y) and (x — u, y — v). Thus, ôa is the Kronecker delta function, 
which is 1 if the gradient step is below a specified threshold R, and 0 if the jump 
in gradient magnitude is too large. This is represented as follows. 


1 if|A@—u,y—v)—AG, y)|<R 


ôa (x —u,y—v)= 
al if ) fe otherwise 


A computationally efficient way of approximating the search for gradients in a local 
neighborhood for a pixel (x, y) is to precompute a stack of minimum and max- 
imum gradients at different spatial resolutions. We refer to the original paper on 
trilateral filtering for additional information [10]. 

Although the method has seven internal parameters, only one needs to be spec- 
ified by the user. All other parameters are derived from this single user param- 
eter. The user parameter oc, is the neighborhood size of the bilateral gradient- 
smoothing filter, specified in pixels. The influence of this parameter on the various 
stages of processing is shown in Figure 8.12. 

For small kernel sizes, too much detail ends up in the base layer, which is sub- 
sequently compressed. The consequence is that these details are absent from the 
final tone-mapped image. For larger values of o. œ, the details are separated more 
sensibly from the HDR component and thus detail is preserved in the tone-mapped 
images. This is shown in the rightmost column in Figure 8.12, where o., is set 
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FIGURE 5.42 For three different values of o (3, 13, and 21 pixels), we show the base layer 
(top), the detail layer (middle), and the tone-mapped result (bottom). 


to 21. This value is recommended for practical use. Larger values have an adverse 
effect on the computation time without creating better images. 

For the purpose of comparison, Figure 8.13 shows an image tone mapped with 
both bilateral and trilateral filters. With comparable parameter choices, the overall 
impression of the two images is similar, although several differences between the 
two images exist. In particular, the trilateral filter affords a better visualization of 
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FIGURE 6.19 Bilateral (left) and trilateral filters (right) applied to the same image with com- 


parable parameter settings. 


the clouds. On the other hand, the tree in the lower right-hand corner is better 
preserved by the bilateral filter. 

In summary, the trilateral filter is a further development over the bilateral filter. 
The filter smooths the image while preserving edges. Good results are achieved by 
tilting the filter kernel dependent on the local gradient information in an image. 
Like Oppenheim’s method and Durand and Dorsey’s bilateral filtering approach, 
Choudhury and Tumblin’s trilateral filter is used to separate a density image into an 
LDR high-frequency image, and an HDR low-frequency image. The latter is com- 
pressed and recombined with the former to produce a tone-mapped density image. 
This result is then exponentiated to compute a displayable image. 
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High-frequency components in an image cause rapid changes from one pixel to 
the next. On the other hand, low-frequency features cause the differences between 
neighboring pixels to be relatively small. It is therefore possible to partially distin- 
guish between illuminance and reflectance in a different way by considering the 
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gradients in the image. Under the assumption of diffusely reflecting scenes, this 
separation may be reasonably successful, as shown by Horn’s lightness computation 
(discussed in the following section). 

Although such separation depends on thresholding, tone reproduction does not 
necessarily require separation of illuminance and reflectance. In addition, HDR im- 
agery frequently depicts scenes that deviate significantly from the assumption of 
diffuse reflection. Fattal et al. have shown that image gradients may be attenuated 
rather than thresholded, leading to a capable tone-reproduction operator (discussed 
in Section 8.2.2). 


8.2.1 HORN LIGHTNESS COMPUTATION 


The first to explore the idea of separating reflectance from illuminance on the basis 
of the gradient magnitude was Berthold Horn [53]. His work outlines a computa- 
tional model of human lightness perception, that is a perceptual quantity that cor- 
relates with surface reflectance. Like Stockham and colleagues [91,123], this work 
assumes that each pixel of an image is formed as the product of illumination and 
surface reflectance, as follows. 


Ly(x, y) = Ev(x, y)r(x, y) 


Here, Ly(x, y) is the pixel’s luminance and E,(x, y) and r(x, y) are illuminance 
and reflectance components, respectively. In the log domain, a density image would 
represent the same information, as follows. 


D(x, y) = log(Ly(x, y)) 
= log(Ey(x, y)) + log(r(x, y)) 


Taking the derivative of D(x, y) gives us the gradient, which is a 2-vector of partial 
derivatives in the horizontal and vertical directions. The gradient field of an image 
may be approximated using forward differences, as follows. 


VG, y) = (Gx, y), Gy(x, y)) 
= (D(x + 1, y)— D(x, y), Da, y +1) — D(x, y)) 
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Note that differences in the log domain correspond to ratios in linear space. By 
computing a gradient field of a density image we are effectively computing contrast 
ratios. 

Edges in an image will produce sharp pulses in the gradient field, whereas the 
spatial variation of illumination will produce only small gradient values. To sepa- 
rate reflectance from illuminance, it is now possible to threshold the gradient and 
discard any small gradients, as follows. 


VG(x,y)=0 if JG, y+ Gy(x, y) <1 


Integration of the remaining gradients yields an image that represents lightness. 
Integration of a discrete image is straightforward in one dimension, but amounts 
to solving a partial differential equation in two dimensions. The particular form of 
this equation is as follows. 


V? D(x, y) = div G(x, y) 


This is Poisson’s equation with V7 the Laplacian operator and div G(x, y) the di- 
vergence of G(x, y). The Laplacian and divergence may be approximated in two 
dimensions using a differencing scheme, as follows. 


V?D(x, y) ¥ Dx +1, y)+ Dx —-1,y)+ D(x, y+ D 
+ D(x,y- 1)—4D(x, y) 
div G(x, y) © Gx(x, y) — Ga(x — 1, y) + Gy(x, y) — Gy(x, y — 1) 


The Poisson equation cannot be solved analytically, but must be approximated nu- 
merically. The method of choice is the full multigrid method, for which off-the- 
shelf routines are available [101]. Finally, the resulting density image D(x, y) is 
exponentiated to produce the final image La (x, y). 

The success of separating illuminance from reflectance in this manner depends 
on the choice of threshold value ¢. Setting the threshold too low will cause the 
resulting image to contain both the reflectance component and some residual illu- 
minance. If the threshold is chosen too high, the integrated result will only partially 
represent reflectance. 
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Left: a reproduction of one of Piet Mondrian’s paintings. On the right, a 
mini-world of Mondrian (from [88]). 


It is also important to note that this method assumes that no light sources are 
directly visible in the image. Horn presented his work in the context of Land’s 
retinex theory, which was tested with mini-worlds of Mondrian [69].? In such 
worlds, scenes are flat areas divided into subregions of uniform matte color (see 
Figure 8.14). The lighting of such worlds creates smooth shading variations within 
each panel, but sharp gradient jumps between regions. Thus, the observation that 
reflectance causes sharp spikes in the gradient whereas illuminance is smoothly 
varying holds for this type of idealized scene. 

For practical scenes that are generally more complicated, this assumption may 
not hold. In particular, if there are light sources directly visible in the image one 


2 = Mini-worlds of Mondriaan are inspired by the neo-plasticist painting style pioneered by famous Dutch artist Piet Mondri- 
aan. Over time, the spelling has become anglicized so that mini-worlds of Mondriaan are now more commonly known as 
mini-worlds of Mondrian. 


8.2 GRADIENT DOMAIN OPERATORS 349 


may expect the illuminance component to also exhibit large gradients, causing this 
approach to fail to successfully separate illuminance and reflectance. Depth discon- 
tinuities, specular highlights, and the presence of fluorescent materials may also 
cause the separation of reflectance from illuminance to be incomplete. Horn con- 
cludes that the method may be reasonable for the computation of lightness, but not 
for computing reflectance when applied to general images. 

In Figure 8.15 the output of this approach is shown for different threshold val- 
ues f. Small gradients are indeed removed, whereas dependent on the choice of 
threshold value reflectance edges are reasonably well respected. The images pro- 
duced by this technique bear a resemblance to those created by bilateral filtering. 
In particular, compare the results of Figure 8.15 with Figure 8.6. Both techniques 
blur images without blurring across sharp edges. We therefore speculate that Horn’s 
lightness computations may be viewed as an early example of an edge-preserving 
smoothing operator. 

For the purpose of demonstration, Figure 8.15 shows an LDR image because it 
is close in nature to a mini-world of Mondrian. HDR images do not tend to adhere 
to the restrictions imposed by the mini-worlds of Mondriaan. The direct applica- 
tion of the previous thresholding technique is therefore not practical. On the other 
hand, large gradients in HDR images are correlated with illuminance variations. We 
therefore applied the same thresholding technique to an HDR image, although now 
we remove gradients that are larger than the threshold f, as follows. 


VG, y)=0 iff JCE, y2 + Gy(x, y)2 >t 


Results of this new thresholding scheme are shown in Figure 8.16. It is clear that 
this thresholding scheme is a fairly crude method of bringing an HDR image within 
a displayable range. It indicates that compressing the gradient field in some fashion 
may be a viable approach, although perhaps not using simple thresholding. 
Although Horn was largely interested in computational models of human light- 
ness perception, we have shown that a small modification could make the tech- 
nique suitable for HDR compression. Thresholding may be too crude for practical 
purposes, and the appropriate selection of a suitable threshold would be a matter 
of trial and error. On the other hand, modifying the gradient field of an image 
and then integrating the result does present an opportunity for effective dynamic 
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Horn’s lightness computation for threshold values of t = 0.0 (top left) through 
t = 0.1 in increments of 0.02. The original photograph is shown in the top left. 


New thresholding scheme applied to Horn’s lightness computation using threshold 
values of t = 0.25 (top left) through t = 1.50 in increments of 0.25. (Image courtesy of the 
Albin Polasek Museum, Winter Park, Florida.) 
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range reduction. This approach is taken by Fattal’s gradient domain compression 
algorithm, discussed in the following section. 


8.2.2 FATTAL GRADIENT DOMAIN COMPRESSION 


Fattal et al. presented an alternative compression algorithm that achieves HDR re- 
duction by applying a compressive function to the gradient field [32]. Following 
Horn, they compute the gradient field of a density image, manipulate these gradi- 
ents, and then integrate by solving a Poisson equation. 

However, rather than thresholding the gradient field their compressive function 
is more sophisticated. Fattal et al. observe that any drastic change in luminance 
across an HDR image gives rise to luminance gradients with a large magnitude. On 
the other hand, fine details (such as texture) correspond to much smaller gradi- 
ents. The proposed solution should therefore identify gradients at various spatial 
scales and attenuate their magnitudes. By making the approach progressive (i.e., 
larger gradients are attenuated more than smaller gradients), fine details may be 
preserved while compressing large luminance gradients. After computing a density 
image D(x, y) = log(L(x, y)), the method proceeds by computing the gradient 
field VG(x, y), as follows. 


VG, y) =(D +1, y) — D(x, y), Dœ, y+ 1) — D(x, y)) 


This gradient field is then attenuated by multiplying each gradient with a com- 
pressive function ®(x, y), resulting in a compressed gradient field VG'(x, y), as 
follows. 


VG'(x, y) = VG(x, y) ®(x, y) 


As in Horn’s approach, a compressed density image D’(x, y) is constructed by 
solving the Poisson equation, as follows. 


V*D' (x, y) = div G' (x, y) 


The rationale for solving this partial differential equation is that we seek a density 
image D'(x, y) with a gradient that approximates G’ (x, y) as closely as possible. In 
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the least squares sense, this conforms to minimizing the following integral. 


i |VD'&, y) — Gi, y) | dxdy 


? 2 1 2 
> IJ (Pes = Gxte.9)) $ (2 =O, ») dxdy 
ôx ôy 


According to the variational principle, D’(x, y) must satisfy the Euler-Lagrange 
equation [101], yielding 


2 8 D' (x,y)  êGx(x, y) 42 8? D'(x, y) êG, y) -o 
ôx? ôx ôy? ôy 


Rearranging terms produces this Poisson equation, which may be solved using the 
full multigrid method [101]. Exponentiating the compressed density image then 
produces the tone-mapped image La(x, y), as follows. 


Lax, y) = exp(D'(x, y)) 


To a large extent the choice of attenuation function will determine the visual quality 
of the result. In the previous section, a very simple example is shown by setting large 
gradients to zero. This produces compressed images, but at the cost of visual quality. 
Fattal et al. follow a different approach and only attenuate large gradients. 

Their attenuation function is based on the observation that edges exist at multiple 
scales [148]. To detect significant ratios, a multiresolution edge-detection scheme 
is employed. Rather than attenuate a significant gradient at the resolution where it 
is detected, the attenuation is propagated to the full resolution gradient field before 
being applied. This scheme avoids haloing artifacts. 

First, a Gaussian pyramid Do, D1 ... Da is constructed from the density image. 
The number of levels d is chosen such that at this coarsest level the resolution of the 
image is at least 32 by 32. At each level s, a gradient field VG (x, y) is computed 
using central differences, as follows. 


2s+1 2s+1 


VGs(x, y) = (A Ly) = Ds(x— ly) Dery + = Di y= 2) 
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At each level and for each pixel, a scale factor may be computed based on the 
magnitude of the gradient, as follows. 


VG; (x, B 
a (! wi 


IVG, (x, y) a 


This scale factor features two user-defined parameters œ and f. Gradients larger 
than @ are attenuated provided that < 1, whereas smaller gradients are not atten- 
uated and in fact may even be somewhat amplified. A reasonable value for œ is 0.1 
times the average gradient magnitude. Fattal et al. suggest setting user parameter f 
between 0.8 and 0.9, although we have found that larger values (up to about 0.96) 
are sometimes required to produce a reasonable image. The attenuation function 
®(x, y) can now be constructed by considering the coarsest level first and then 
propagating partial values in top-down fashion, as follows. 


@a(x, y) = pa (x, y) 
Ps (x, y) = U (s41 (x, vhs (x, y) 
P(x, y) = Po(x, y) 


Here, ®;(x, y) is the partially accumulated scale factor at level s, and U() is an 
upsampling operator with linear interpolation. For one image, the two parameters 
a and 6 were varied to create the tableau of images shown in Figure 8.17. For 
smaller values of 6, more details are visible in the tone-mapped image. A similar 
effect occurs for decreasing values of œ. Both parameters afford a trade-off between 
the amount of compression applied to the image and the amount of detail visible. In 
our opinion, choosing values that are too small for either a or 6 produces images 
that contain too much detail to appear natural. 


Fattal’s gradient domain compression. The user parameters œ and 6 were varied: 
from left to right œ is given values of 0.10, 0.25, and 0.40. From top to bottom, 6 is 0.85, 
0.89, and 0.95. 
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Clamping in Fattal’s gradient domain compression. Top row: clamping 0.1, 1, 


and 10% of the dark pixels. Bottom row: clamping 0.1, 1, and 10% of the light pixels. 


This approach may benefit somewhat from clamping, a technique whereby a 
percentage of the smallest and largest pixel intensities is removed and the remain- 
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ing range of intensities is scaled to fit the display range. Figure 8.18 shows the effect 
of varying the percentage of dark pixels that are clamped (top row) and separately 
varying the percentage of light pixels that are clamped (bottom row). The effect 
of clamping dark pixels is fairly subtle, but dramatic effects may be achieved by 
clamping a percentage of light pixels. In general, if a tone-mapped image appears 
too gray it may be helpful to apply some clamping at the light end. This removes 
outliers that would cause the average luminance to drop too much after normaliza- 
tion. 

In summary, Fattal’s gradient domain compression technique attenuates gradi- 
ents, but does so in a gentler manner than simply thresholding. The two user para- 
meters provide a trade-off between the amount of compression and the amount of 
detail available in the image. Too much compression has the visual effect of exagger- 
ated small details. The technique is similar in spirit to Horn’s lightness computations 


and is the only recent example of a tone-reproduction operator working on gradient 
fields. 


8.3 PERFORMANCE 


For many applications the speed of operation is important. For most tone- 
reproduction operators, performance is simply a function of the size of the image. 
In this section we report results obtained on an Apple iBook G3 running at 800 
MHz using images with 1,600 by 1,200 pixels. 

We show the timing required to execute each tone-mapping operator, but ex- 
clude the time it takes to read the image from disk or write the result to file. We also 
routinely normalize the result of the tone-reproduction operators and apply gamma 
correction. None of these operations is included in the timing results. 

All timing results are summarized in Table 8.1. This table should be interpreted 
with the following caveats. The timing given for Miller’s operator is a rough average 
of the timings shown in Figure 8.19. The timing for the bilateral filter is given 
for a downsampling factor of 16. The computation time of Chiu’s spatially variant 
operator is representative of the algorithm explained in this chapter, but not for 
the full algorithm as described by Chiu et al. (in that we have omitted the iterative 
smoothing stage). 
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Operator Time 
(in seconds) 
GLOBAL OPERATORS 
Miller’s operator ~ 15.0 
Tumblin—Rushmeier’s operator 3.2 
Ward’s scale factor 0.96 
Ferwerda’s operator 1.0* 
Ferschin’s exponential mapping 3.0 
Logarithmic mapping 3.4 
Drago’s logarithmic mapping 2.8 
Reinhard’s global photographic operator 3.7 
Reinhard and Devlin’s photoreceptor model 9.7 
Ward’s histogram adjustment 3.4 
Schlick’s uniform rational quantization 3.4 
LOCAL OPERATORS 
Chiu’s spatially variant operator 10.0 
Rahman and Jobson’s multiscale retinex 120.0 
Johnson and Fairchild’s iCAM 66.0 
Ashikhmin’s operator 120.0 
Reinhard’s local photographic operator 80.0 
GRADIENT DOMAIN OPERATORS 
Horn’s lightness computation 45.0 
Fattal’s gradient domain compression 45.5 
FREQUENCY-BASED OPERATORS 

Oppenheim’s operator 12.4 


Durand’s bilateral filtering 23.5 
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TABLE 8.1 Computation time for all operators using 1,600 by 1,200 images on 
an Apple iBook with 512 MB RAM and a G3 processor running at 800 MHz. Note: 
* Ferwerda’s operator does not include the algorithm to lower visual acuity in scotopic 
lighting conditions. 


8.3.1 LOCAL AND GLOBAL OPERATORS 


In general, global operators are the fastest to execute because normally only two 
or three passes over the image are required. In each pass, only very simple com- 
putations are performed. For applications that require real-time operation, global 
operators would be the first choice. 

Local operators rely on the computation of local averages for each pixel. Such 
local averages are often computed by convolving the image with a filter kernel. For 
filter kernels larger than about 3 by 3 pixels, it is faster to Fourier transform both 
the image and the filter kernel and perform a pairwise multiplication in the Fourier 
domain. The convolved image is then obtained by applying an inverse Fourier trans- 
form on the result. Whether the convolution is computed directly or by means of 
the Fourier transform, local operators tend to be much slower than their global 
counterparts. 

The performance of global operators is usually dependent only on the size of the 
image. The exception is Miller’s operator, which is also weakly dependent on the 
maximum display luminance. The running time as a function of maximum display 
luminance is plotted in Figure 8.19. In all cases, the execution time remains below 
about 17 seconds. 

The performance of the iCAM model depends on whether the adaptation level is 
computed from the luminance channel only, or for all three channels independently. 
The former takes 44 seconds, whereas the latter takes 66 seconds. 

Our implementation of the multiscale observer model requires a substantial 
amount of memory to store the full image pyramid. We were not able to reliably 
measure the execution time of this operator for the default image size of 1,600 by 
1,200 because our iBook did not have sufficient memory (512 MB) to complete the 
computation without significant swapping. 
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The performance of the local version of Reinhard’s photographic operator is 
about 80 seconds. For two reasons, this constitutes a performance gain with respect 
to other operators that also build a Gaussian pyramid. First, this operator compresses 
a luminance channel, as opposed to three color channels in the multiscale observer 
model. In comparison with Ashikhmin’s operator, the total number of levels in the 
Gaussian pyramid is smaller. 


8.3.2 GRADIENT AND FREQUENCY DOMAIN 
OPERATORS 


Gradient domain operators require an integration step that is both approximate and 
costly, though less so than techniques that build image pyramids. The numerical 
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integration method of choice is the full multigrid method, which dominates the 
computation time of this class of operators. For example, Horn’s lightness compu- 
tation takes about 45 seconds. We found the gradient domain operator to be very 
similar in performance to Horn’s operator (about 45.5 seconds). 

Frequency domain operators rely on FFTs to obtain a frequency-space represen- 
tation. We used the public domain FFTW library [39], and for these operators the 
performance of the FFT transform dominates the computation time. Note that the 
speed of executing an FFT depends strongly on the size of the image. Any image 
size that has a large number of factors will be substantially faster than image sizes 
that have a smaller number of factors. Although the running time depends on image 
size, this dependency is not linear. Our results are obtained with 1,600-by-1,200 
images. These numbers may be factored into 2 x 2x2x2x2x2x5x5 
and2 x 2x 2x 2x 3x 5x 5, and therefore have eight and seven factors, 
respectively. This yields a relatively fast computation of FFTs. On the other hand, if 
the image were smaller by 1 pixel in each dimension (1,599 by 1,199 pixels) the 
factors would be 3 x 13 x 41 and 11 x 109. This would have a negative impact on 
the computation time. In general, images that are powers of 2 will be the fastest and 
images the size of prime numbers will be the slowest to compute. In terms of per- 
formance, it is beneficial to pad images to a power-of-2 size if a Fourier transform 
needs to be computed. 

An alternative filtering technique is to apply a fast but approximate filter, such 
as that described by Burt and Adelson [8]. For filter kernels with a Gaussian shape, 
this may speed up the computations, but with a loss of accuracy. We have found 
that this approximation is useful only for larger filter kernels. For very small filter 
kernels, this approximation may not be accurate enough. 

Oppenheim’s frequency-based operator takes about 12.4 seconds. Bilateral filter- 
ing takes about 23 seconds when a downsampling factor of 16 is selected. This is 
somewhat slower than Oppenheim’s operator, although not dramatically so. Smaller 
downsampling factors will cause the computation time to increase significantly. If 
no downsampling is used, the computation time of the same image increases to 
685 seconds. The progression of computation times as a function of downsampling 
factors is depicted in Figure 8.20. 

The computational complexity of the trilateral filter is of necessity higher than 
for the bilateral filter, in that its main computational cost is the double application 
of the bilateral filter. If the same optimizations are employed as outlined for the 
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bilateral filter in Section 8.1.2, we expect the running times to be about double 
that of the bilateral filter. However, our implementation does not incorporate these 
optimizations, and we therefore recorded computation times that are substantially 
higher. 


8.4 DISCUSSION 


Tone-reproduction operators achieve dynamic-range reduction based on a small set 
of distinct observations. We have chosen to classify operators into four classes, two 
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of which are loosely based on image formation and two others operating in the 
spatial domain. 

The underlying assumption of the first two classes is that images are formed 
by light being reflected from surfaces. In particular, the light intensities recorded 
in an image are assumed to be the product of light being reflected from a surface 
and the surface’s ability to reflect. This led to Oppenheim’s frequency-dependent 
attenuation. Subdivision of an image into base and detail layers may be seen as a 
frequency-dependent operation. The bilateral and trilateral filters, however, lift the 
restriction that images are assumed largely diffuse. These filters operate well for im- 
ages depicting scenes containing directly visible light sources, specular reflections, 
and so on. 

A parallel development has occurred in the class of gradient domain opera- 
tors. Horn’s lightness computation is aimed at disentangling illumination from re- 
flectance by thresholding gradients. It necessarily assumes that scenes are diffuse. 
This restriction is lifted by Fattal et al., who attenuate large gradients but keep small 
gradients intact. 

Various tone-reproduction operators use partial computational models of the hu- 
man visual system to achieve dynamic-range reduction. A returning theme within 
this class of operators is the notion of adaptation luminance. Global operators often 
derive an adaptation level from the (log) average luminance of the image, whereas 
local operators compute a separate adaptation level for each pixel. Local adaptation 
levels are effectively weighted local averages of pixel neighborhoods. If the size of 
these neighborhoods is grown to include the entire image, these local operators 
default to global operators. It should therefore be possible to replace the global 
adaptation level of a global operator with a locally derived set of adaptation lev- 
els. The validity of this observation is demonstrated in Figure 8.21, in which the 
semisaturation constant of Reinhard and Devlin’s photoreceptor model is fed with 
various luminance adaptation computations. 

There are various ways of computing local adaptation luminances. The use of a 
stack of Gaussian-blurred images may be closest to the actual working of the hu- 
man visual system, and with a carefully designed scale selection mechanism (such 
as shown by Reinhard et al.’s photographic operator, as well as Ashikhmin’s op- 
erator), local adaptation levels may be computed that minimize haloing artifacts. 
Alternatively, edge-preserving smoothing filters may be used to derive local adapta- 
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Global photoreceptor model (top left) and global photoreceptor model with the 
light adaptation parameter set to 0 (top right), followed by local versions in which adapting 
luminances are computed with Durand’s bilateral filter (bottom left) and Pattanaik’s gain control 
operator (bottom right) [23,96]. 


tion luminances. Examples of such filters are the bilateral and trilateral filters, as well 
as the low-curvature image simplifier [132] and the mean shift algorithm [13]. 
The human visual system adapts over a period of time to novel lighting condi- 
tions. This is evident when entering a dark tunnel from bright daylight. It takes a 
short period of time before all details inside the tunnel become visible. Such time- 
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dependent behavior may also be included in tone-reproduction operators [22,35, 
43,95,112]. 

Finally, we would like to stress the fact that each of these operators has its own 
strengths and weaknesses. Computational complexity, presence or absence of arti- 
facts, and ability to deal with extreme HDR images should all be considered. We 
believe that there is no single operator that will be the best choice for all tasks, or 
even for all images. 

For instance, in photography the purpose of tone reproduction may be to pro- 
duce an image that appears as beautiful as possible. It may not be necessary to show 
every last detail in the captured image to achieve this goal. In addition, for this type 
of application a fully automatic operator may be less desirable than one that provides 
intuitive user parameters that allow the final result to be steered in the direction the 
photographer has in mind. 

On the other hand, the task may be to visualize data, for instance if the HDR 
data is the result of a scientific simulation. In such cases it may be more important 
to visualize all important details than to produce an appealing image. It may be 
undesirable to have user parameters in this case. 

For video and film, tone reproduction should produce consistent results between 
consecutive frames. In addition, tone reproduction could conceivably be used cre- 
atively to steer the mood of the scene, and thus help convey a story. 

Thus, appropriate tone-reproduction operators should be matched to the task at 
hand. The current state of affairs is that we do not know how to match an operator 
to a given task. Selection of tone-reproduction operators is usually a matter of taste, 
as well as public availability of source code. We hope to alleviate the latter problem 
by having made the source code of all of our implementations available on the 
companion DVD-ROM. 


Image-based Lighting 


9.1 INTRODUCTION 


The previous chapters in this book have described numerous properties and advan- 
tages of HDR imagery. A major advantage is that HDR pixel values can cover the full 
range of light in a scene and can be stored as calibrated linear-response measure- 
ments of incident illumination. Earlier chapters have described how these images are 
useful for improved image processing, and for determining how a human observer 
might perceive a real-world scene, even if shown on an LDR display. 

This chapter describes how HDR images can be used as sources of illumina- 
tion for computer-generated objects and scenes. Because HDR images record the 
full range of light arriving at a point in space, they contain information about the 
shape, color, and intensity of direct light sources, as well as the color and distri- 
bution of the indirect light from surfaces in the rest of the scene. Using suitable 
rendering algorithms, we can use HDR images to accurately simulate how objects 
and environments would look if they were illuminated by light from the real world. 
This process of using images as light sources is called image-based lighting (IBL). In that 
IBL generally involves the use of HDR images, both the IBL process and the HDR 
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images used for IBL are sometimes referred to as HDRI for high-dynamic-range 
imagery. 

Figure 9.1 compares a simple scene illuminated by a traditional computer graph- 
ics light source (a) to its appearance as illuminated by three different image-based 
lighting environments (b through d). The scene’s geometry consists of simple 
shapes and materials such as plastic, metal, and glass. In all of these images, the 


(b) 


mS em 


(d) 


Scene illuminated with (a) a traditional point light source. Scene illuminated with 
(b—d) HDR image-based lighting environments, including (b) sunset on a beach, (c) inside a 
cathedral with stained-glass windows, and (d) outside on a cloudy day. 
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lighting is being simulated using the RADIANCE global illumination system [205]. 
Without IBL (a), the illumination is harsh and simplistic, and the scene appears 
noticeably computer generated. With IBL (b through d), the scene’s level of realism 
and visual interest are increased — the shadows, reflections, and shading all exhibit 
complexities and subtleties that are realistic and internally consistent. In each ren- 
dering, a view of the captured environment appears in the background behind the 
objects. As another benefit of IBL, the objects appear to actually belong within the 
scenes that are lighting them. 

In addition to HDR photography, IBL leverages two other important processes. 
One of them is omnidirectional photography, the process of capturing images that 
see in all directions from a particular point in space. HDR images used for IBL 
generally need to be omnidirectional, because light coming from every direction 
typically contributes to the appearance of real-world objects. This chapter describes 
some common methods of acquiring omnidirectional HDR images, known as light 
probe images or HDR environment maps, which can be used as HDR image-based lighting 
environments. 

The other key technology for IBL is global illumination: rendering algorithms that 
simulate how light travels from light sources, reflects between surfaces, and pro- 
duces the appearance of the computer-generated objects in renderings. Global illu- 
mination algorithms simulate the interreflection of light between diffuse surfaces, 
known as radiosity [170], and can more generally be built on the machinery of ray 
tracing [206] to simulate light transport within general scenes according to the render- 
ing equation [175]. This chapter describes how such algorithms operate and demon- 
strates how they can be used to illuminate computer-generated scenes and objects 
with light captured in light probe images. 

An important application of IBL is in the area of motion picture visual effects, 
where a common effect is to add computer-generated objects, creatures, and actors 
into filmed imagery as if they were really there when the scene was photographed. 
A key part of this problem is to match the light on the CG elements to be plausibly 
consistent with the light present within the environment. With IBL techniques, the 
real illumination can be captured at the location the CG object needs to be placed, 
and then used to light the CG element so that it has the same shading, shadows, and 
highlights as if it were really in the scene. Using this lighting as a starting point, 
visual effects artists can augment and sculpt the image-based lighting to achieve 
effects that are both dramatic and realistic. 
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Within the scope of IBL, there are several variants and extensions that increase 
the range of application of the technique. When implemented naively in global il- 
lumination software, IBL renderings can be computationally expensive for scenes 
with concentrated light sources. This chapter presents both user-guided and auto- 
matic techniques for importance sampling that make IBL calculations more efficient. In 
addition, various approximations of IBL can produce convincing results for many 
materials and environments, especially under appropriate artistic guidance. One is 
environment mapping, a precursor to IBL that yields extremely efficient and often con- 
vincing renderings by directly mapping images of an environment onto object sur- 
faces. Another technique, ambient occlusion, uses some of IBL’s machinery to approxi- 
mate an object’s self-shadowing so that it can be quickly applied to different lighting 
environments. To begin, we will start with a detailed example of IBL in a relatively 
basic form. 


9.2 BASIC IMAGE-BASED LIGHTING 


This section describes image-based lighting in both theoretical and practical terms 
using the example of Rendering with Natural Light (RNL), an IBL animation shown at the 
SIGGRAPH 98 Electronic Theater. The RNL scene is a still life of computer-generated 
spheres on a pedestal, and is illuminated by light captured in the Eucalyptus grove 
at the University of California at Berkeley. The animation was modeled, rendered, 
and illuminated using the RADIANCE lighting simulation system [205], and the 
necessary scene files and images for creating the animation are included on the 
companion DVD-ROM. The animation was created via the following steps: 


Acquire and assemble the light probe image. 

Model the geometry and reflectance of the scene. 

Map the light probe to an emissive surface surrounding the scene. 
Render the scene as illuminated by the IBL environment. 


aopon = 


Postprocess and tone map the renderings. 


9.2.1 ACQUIRE AND ASSEMBLE THE LIGHT PROBE 


The lighting environment for RNL was acquired in the late afternoon using a three- 
inch chrome bearing and a digital camera. The mirrored ball and a digital video 
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camera were placed on tripods about 4 feet from each other and 3.5 feet off the 
ground.* The digital video camera was zoomed until the sphere filled the frame, 
and the focus was set so that the reflected image was sharp. The camera’s aperture 
was narrowed to f/8 to allow for sufficient depth of field, and an HDR image series 
was acquired with shutter speeds varying from i second to T000 second, spaced 
one stop apart. To cover the scene with better sampling, a second series was acquired 
after having moved the camera 90 degrees around to see the ball from the side (see 
Section 9.3.1). The process took only a few minutes and the resulting image series 
can be seen in Figures 9.2 (a) and 9.2 (c). 

Mirrored spheres reflect nearly the entire environment they are in— not, as 
sometimes assumed, just the hemisphere looking back in the direction of the cam- 
era. This follows from the basic mirror formula that the angle of incidence is equal 
to the angle of reflection: rays near the outer edge of a sphere’s image have an an- 
gle of reflection toward the camera that nears 90 degrees, and thus their angle of 
incidence also nears 90 degrees. Thus, the ray’s angle of incidence relative to the 
camera nears 180 degrees, meaning that the rays originate from nearly the opposite 
side of the sphere relative to the camera. 

Fach image series of the sphere was converted into an HDR image using the HDR 
image assembly algorithm in Debevec and Malik [164] (Chapter 4), and the images 
were saved in RADIANCE’s native HDR image format (Chapter 3). The algorithm 
derived the response curve of the video camera and produced HDR images where 
the pixel values were proportional to the light values reflected by the mirrored 
sphere. The total dynamic range of the scene was approximately 5,000:1, measuring 
from the dark shadows beneath the bushes to the bright blue of the sky and the thin 
white clouds lit from behind by the sun. As another measure of the range, the 
brightest pixel values in the sky and cloud regions were some 150 times the average 
level of light in the scene. The two views of the sphere were combined (using 
techniques presented in Section 9.3.1) and mapped into the angular map space 
(described in Section 9.4.2) to become the light probe image seen in Figure 9.2 (b). 


1 Many tripods allow the center pole (the vertical pole the tripod head is attached to) to be removed from the legs and 
reinserted upside-down, leaving the end of the pole pointing up and able to accommodate a mirrored sphere. 
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(b) 


The two HDRI series (a and c) used to capture the illumination in the Eucalyptus 
grove for RNL, and the resulting combined light probe image (b), converted into the Angular Map 


format. 


9.2.2 MODEL THE GEOMETRY AND REFLECTANCE OF 
THE SCENE 


The RNL scene’s spheres, stands, and pedestal were modeled using RADIANCE’s 
standard scene primitives and generators. Each sphere was given a different material 
property with different colors of glass, metal, and plastic. The pedestal itself was 
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texture mapped with a polished marble texture. The scene specification files are 
included on the companion DVD-ROM as rn|]_scene.rad and gensup.sh. 


9.2.3 MAP THE LIGHT PROBE TO AN EMISSIVE 
SURFACE SURROUNDING THE SCENE 


In IBL, the scene is surrounded (either conceptually or literally) by a surface that 
the light probe image is mapped onto. In the simplest case, this surface is an in- 
finite sphere. The RNL animation used a large but finite inward-pointing cube, 
positioned so that the bottom of the pedestal sat centered on the bottom of the 
cube (Figure 9.3). The light probe image was mapped onto the inner surfaces of 


FIGURE 


5.3 The RNL pedestal is seen within the large surrounding box, texture mapped with 
the Eucalyptus Grove light probe image. 
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the cube so that from the perspective of the top of the pedestal the light from 
the environment would come from substantially the same directions as it would 
have in the forest. The RADIANCE shading language was sufficiently general to al- 
low this mapping to be specified in a straightforward manner. When a ray hits a 
surface, it reports the 3D point of intersection P = (P,, Py, P,) to user-supplied 
equations that compute the texture map coordinate for that point of the surface. 
In the RNL scene, the top of the pedestal was at the origin, and thus the direc- 
tion vector into the probe was simply the vector pointing toward P. From this 
direction, the (u, v) coordinates for the corresponding pixel in the light probe 
image are computed using the angular map equations in Section 9.4.2. These 
calculations are specified in the file angmap.cal included on the companion 
DVD-ROM. 

In IBL, the surface surrounding the scene is specified to be emissive, so that its 
texture is treated as a map of light emanating from the surface rather than a map 
of surface reflectance. In RADIANCE this is done by assigning this environment the 
glow material, which tells the renderer that once a ray hits this surface the radiance 
along the ray should be taken directly as the HDR color in the image, rather than 
the product of the texture color and the illumination incident upon it. When the 
environment surface is viewed directly (as in most of Figure 9.3), it appears as an 
image-based rendering with the same pixel colors as in the original light probe 
image. 


9.2.4 RENDER THE SCENE AS ILLUMINATED BY THE 
IBL ENVIRONMENT 


With the scene modeled and the light probe image mapped onto the surround- 
ing surface, RADIANCE was ready to create the renderings using IBL. Appropriate 
rendering parameters were chosen for the number of rays to be used per pixel, 
and a camera path was animated to move around and within the scene. RADIANCE 
simulated how the objects would look as if illuminated by the light from the envi- 
ronment surrounding them. Some renderings from the resulting image sequence 
are shown in Figure 9.4. This lighting process is explained in further detail in 
Section 9.5. 
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FIGURE 9.4 Three frames from Rendering with Natural Light. Rendered frames (a, c, and e) 
before postprocessing. After postprocessing (b, d, and f) as described in Section 9.2.5. 
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9.2.5 POSTPROCESS THE RENDERINGS 


A final step in creating an IBL rendering or animation is to choose how to tone 
map the images for display. RADIANCE and many recent rendering systems can 
output their renderings as HDR image files. Because of this, the RNL renderings 
exhibited the full dynamic range of the original lighting environment, including 
the very bright areas seen in the sky and in the specular reflections of the glossy 
spheres. Because the renderings exhibit a greater dynamic range than can be shown 
on typical displays, some form of tone mapping is needed to produce the final dis- 
playable images. The most straightforward method of tone mapping is to pick a 
visually pleasing exposure factor for the image, truncate the bright regions to the 
maximum “white” value of the display, and apply any needed compensation for the 
response curve of the display (most commonly, applying a gamma-correction func- 
tion). With this technique, values below the white point are reproduced accurately, 
but everything above the white point is clipped. 

Chapters 6, 7, and 8 discussed several tone-reproduction operators that reduce 
the dynamic range of an image in a natural way to fall within the range of the dis- 
play, any of which could be used for postprocessing IBL images. Another approach 
to postprocessing HDR images is to simulate some of the optical imperfections of 
a real camera system that communicate the full dynamic range of bright regions 
through blooming and vignetting effects. Such operators often work well in con- 
junction with IBL rendering because, like IBL, they are designed to simulate the 
appearance of photographic imagery. 

Let’s first examine how a blurring operator can communicate the full dynamic 
range of a scene even on an LDR display. The top of Figure 9.5 shows two bright 
square regions in an image. In the HDR image file, the right-hand square is six 
times brighter than the left (as seen in the graph below the squares). However, 
because the maximum “white” point of the display is below the brightness of the 
dimmer square the displayed squares appear to be the same intensity. If we apply 
a Gaussian blur convolution to the HDR pixel values, the blurred squares appear 
very different, even when clipped to the display’s white point. The dim blurred 
square now falls considerably below the white point, whereas the middle of the 
bright blurred square still exceeds the range of the display. The brighter region also 
appears larger, even though the regions were originally the same size and are filtered 
with the same amount of blur. This effect is called blooming. 
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(a) Before blur (b) After blur 


Two bright image squares (a) with pixel values 1.5 and 6 are shown on a display 
with a white point of 1. Because of clipping, the squares appear the same. A graph of pixel intensity 
for a scan line passing through the centers of the squares is shown below, with “white” indicated 
by the dotted line. After applying a Gaussian blur (b), the blurred squares are very different in 
appearance, even though they are still clipped to the display white point. 


Similar blooming effects are seen frequently in real photographs, in which blur 
can be caused by any number of factors, including camera motion, subject motion, 
image defocus, “soft focus” filters placed in front of the lens, dust and coatings on 
lens surfaces, and scattering of light within the air and image sensor. Figure 9.6(a) 
shows an image acquired in HDR taken inside Stanford’s Memorial church. When a 
clipped LDR version of the image is blurred horizontally (Figure 9.6(b)), the bright 
stained-glass windows become noticeably darker. When the HDR version of the 
image is blurred with the same filter (Figure 9.6(c)), the windows appear as vibrant 
bright streaks, even when clipped to the white point of the display. In addition to 
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(b) 


(© (d) 


An HDR scene inside a church with bright stained-glass windows. (b) Horizontally 
blurring the clipped LDR image gives the effect of image motion, but it noticeably darkens the 
appearance of the stained-glass windows. (c) Blurring an HDR image of the scene produces bright 
and well-defined streaks from the windows. (d) Real motion blur obtained by rotating camera 
during the exposure validates the HDR blurring simulated in image c. 


a2 BASIC IMAGE-BASED LIGHTING 379 


the HDR image series, the photographer also acquired a motion-blurred version 
of the church interior by rotating the camera on the tripod during a half-second 
exposure (Figure 9.6(d)). The bright streaks in this real blurred image (though not 
perfectly horizontal) are very similar to the streaks computed by the HDR blurring 
process seen in Figure 9.6(c), and dissimilar to the LDR blur seen in Figure 9.6(d). 

The renderings for Rendering with Natural Light were postprocessed using a summa- 
tion of differently blurred versions of the renderings produced by RADIANCE. Each 
final image used in the film was a weighted average of several differently blurred 
versions of the image. All of the blur functions used in RNL were Gaussian filters, 
and their particular mixture is illustrated in Figure 9.7. 

With the right techniques, such blurring processes can be performed very effi- 
ciently, even though convolving images with wide filters is normally computation- 
ally expensive. First, Gaussian filters are separable, so that the filter can be performed 


T = 3Blur(Z, 1) + TpBlur(/, 15) 


+ ipBlur(, 50) + sp Blur(7, 150) 


The mix of blur filters used to postprocess the RNL frames. Blur(/, o ) indicates 
a Gaussian blur with a standard deviation of o pixels. 
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as a 1D Gaussian blur in x followed by a 1D Gaussian blur in y. Second, a Gaussian 
blur can be closely approximated as several successive box blurs. Finally, the wider 
blurs can be closely approximated by applying correspondingly narrower blurring 
filters to a low-resolution version of the image, and then up-sampling the small 
blurred version. With these enhancements, efficient implementations of such tech- 
niques can be achieved on modern GPUs [187], and considerably more elaborate 
lens flare effects can be performed in real time [176]. 

Postprocessed frames from RNL can be seen in the right-hand column of Fig- 
ure 9.4. Because the final postprocessed images are 75% composed of the original 
renderings with just a slight blur applied, the original image detail is still evident. 
However, because of the other blurred versions added, the bright parts of the envi- 
ronment and their specular reflections tend to bloom in the final renderings. Often, 
as in Figure 9.7, the bloom from bright spots in the environment appears to “wrap” 
around objects in the foreground. This is a surprisingly natural effect that helps the 
rendered objects appear to belong within the rest of the scene. 

As mentioned previously, this postprocessing of HDR imagery is a form of tone 
mapping, the effects of which are similar to the results produced by using “soft 
focus,” “mist,” and “fog” filters on real camera lenses. The effects are also similar to 
the effects of the optical imperfections of the human eye. A detailed model of the 
particular glare and bloom effects produced in the human eye is constructed and 
simulated by Spencer et al. [198]. In addition, a basic model of human eye glare 
was used in conjunction with a tone-reproduction operator by Larson et al. [180]. 

A final subtle effect applied to the renderings in RNL is vignetting. Vignetting is 
the process of gently darkening the pixel values of an image toward its corners, 
which occurs naturally in many camera lenses (particularly at wide apertures) and 
is sometimes intentionally exaggerated with an additional mask or iris for photo- 
graphic effect. Applying this effect to an HDR image before tone mapping the pixel 
values can also help communicate a greater sense of the range of light in a scene, 
particularly in animations. With this effect, as a bright region moves from the center 
of the field of view to the edge, the pixels around it dim. However, its particularly 
bright pixels will still reach the white point of the display. Thus, different exposures 
of the scene are revealed in a natural manner simply through camera motion. The 
effect is easily achieved by multiplying an image by a brightness falloff image such 
as in Figure 9.8(a). Figures 9.8(b) and 9.8(c) show a rendering from RNL before 
and after vignetting. 
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(a) (b) 


FIGURE 9.8 Via the brightness fall-off function shown in image (a), a frame from the RNL 
animation is seen (b) before and (c) after vignetting. In this image, this operation darkens the 
right-side image corners, reveals detail in the slightly overexposed upper left corner, and has little 
effect on the extremely bright lower left corner. In motion, this vignetting effect helps communicate 
the HDR environment on an LDR display. 


As an early IBL example, RNL differed from most CG animations in that de- 
signing the lighting in the scene was a matter of choosing real light from a 
real location rather than constructing the light as an arrangement of computer- 
generated light sources. Using global illumination to simulate the image-based 
lighting naturally produced shading, highlights, refractions, and shadows that were 
consistent with one another and with the environment surrounding the scene. 
With traditional CG lighting, such an appearance would have been difficult to 
achieve. 


9.3 CAPTURING LIGHT PROBE IMAGES 


Moving on from our basic example, the next few sections present the key stages 
of IBL with greater generality and detail, beginning with the process of lighting 
capture. Capturing the incident illumination at a point in space requires taking an 
image with two properties. First, it must see in all directions, in that light coming 
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from anywhere can affect the appearance of an object. Second, it must capture the 
full dynamic range of the light within the scene, from the brightest concentrated 
light sources to the dimmer but larger areas of indirect illumination from other 
surfaces in the scene. In many cases, the standard HDR photography techniques 
presented in Chapter 4 satisfy this second requirement. Thus, the remaining chal- 
lenge is to acquire images that see in all directions, a process known as panoramic 
(or omnidirectional) photography. There are several methods of recording images 
that see in all directions, each with advantages and disadvantages. In this section, 
we describe some of the most common techniques, which include using mirrored 
spheres, tiled photographs, fish-eye lenses, and scanning panoramic cameras. This 
section concludes with a discussion of how to capture light probe images that in- 
clude a direct view of the sun, which is usually too bright to record with standard 
HDR image capture techniques. 


9.3.1 PHOTOGRAPHING A MIRRORED SPHERE 


The technique used to capture the Rendering with Natural Light light probe was to pho- 
tograph a mirrored sphere placed in the scene where it is desired to capture the 
illumination. Using mirrored spheres to obtain omnidirectional reflections of an 
environment was first used for environment mapping [186,207,171] (described in Sec- 
tion 9.8.1), where such images are directly texture mapped onto surfaces of objects. 
The main benefit of photographing a mirrored sphere is that it reflects very nearly 
the entire environment in a single view. Aside from needing two tripods (one for 
the sphere and one for the camera), capturing a light probe image with this tech- 
nique can be fast and convenient. Mirrored spheres are inexpensively available as 2- 
to 4-inch diameter chrome ball bearings (available from the McMaster—Carr cata- 
log, www mcmaster.com), 6- to 12-inch mirrored glass lawn ornaments (available from 
Baker’s Lawn Ornaments, www.bakerslawnorn.com), and Chinese meditation balls (1.5 
to 3 inches). Dubé juggling equipment (www.dube.com) sells polished hollow chrome 
spheres from 2-1/6 to 2-7/8 inches in diameter. Professionally manufactured mir- 
rored surfaces with better optical properties are discussed at the end of this section. 
There are several factors that should be considered when acquiring light probe 
images with a mirrored sphere. These are discussed in the sections that follow. 
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Framing and Focus First, it is desirable to have the sphere be relatively far from the 
camera to minimize the size of the camera’s reflection and to keep the view nearly 
orthographic. To have the sphere be relatively large in the frame, it is necessary 
to use a long-focal-length lens. Many long lenses have difficulty focusing closely 
on small objects, and thus it may be necessary to use a +1 diopter close-up filter 
(available from a professional photography store) on the lens to bring the sphere 
into focus. The image of the sphere usually has a shallow depth of field, especially 
when a close-up filter is used, and thus it is often necessary to use an aperture of 
f/8 or smaller to bring the full image into focus. 


Blind Spots There are several regions in a scene that are usually not captured 
well by a mirrored sphere. One is the region in front of the sphere, which re- 
flects the camera and often the photographer. Another is the region directly behind 
the sphere, which is reflected by a thin area around the sphere’s edge. The last is 
a strip of area from straight down and connecting to the area straight back, which 
usually reflects whatever supports the sphere. For lighting capture, these effects are 
easily minimized by orienting the camera so that no photometrically interesting ar- 
eas of the scene (e.g., bright light sources) fall within these regions. However, it is 
sometimes desirable to obtain clear images of all directions in the environment, for 
example when the light probe image itself will be seen in the background of the 
scene. To do this, one can take two HDR images of the mirrored sphere, with the 
second rotated 90 degrees around from the first. In this way, the poorly represented 
forward and backward directions of one sphere correspond to the well-imaged left 
and right directions of the other, and vice versa. The two images taken for the RNL 
light probe are shown in Figure 9.9. Each image slightly crops the top and bottom 
of the sphere, which was done intentionally to leverage the fact that these areas 
belong to the rear half of the environment that appears in the other sphere image. 
Using an HDR image-editing program such as HDR Shop [202], these two images 
can be combined into a single view of the entire environment that represents all di- 
rections well except straight down. If needed, this final area could be filled in from 
a photograph of the ground or through manual image editing. 


Calibrating Sphere Reflectivity It is important to account for the fact that mir- 
rored spheres are generally not optically perfect reflectors. Though the effect is often 
unnoticed, ball bearings typically reflect only a bit more than half of the light hitting 
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(d) 


3 Acquiring a light probe with a mirrored sphere. (a) and (b) show two images of 
the sphere, taken 90 degrees apart. (c) and (d) show the mirrored sphere images transformed into 
latitude-longitude mappings. In (d), the mapping has been rotated 90 degrees to the left to line 
up with the mapping of (c). The black teardrop shapes correspond to the cropped regions at the 
top and bottom of each sphere, and the pinched area between each pair of drops corresponds to the 
poorly sampled region near the outer edge of the sphere image. Each sphere yields good image data 
where the other one has artifacts, and combining the best regions of each can produce a relatively 
seamless light probe image, as in Figure 9.2. 
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them. In some IBL applications, the lighting is captured using a mirrored sphere, 
and the background image of the scene is photographed directly. To correct the 
sphere image so that it is photometrically consistent with the background, we need 
to measure the reflectivity of the sphere. This can be done using a setup such as 
that shown in Figure 9.10. In this single photograph, taken with a radiometrically 
calibrated camera, the indicated patch of diffuse paper is reflected in the ball. We 
can thus divide the average pixel value of the patch in the reflection by the average 
pixel value in the direct view to obtain the sphere’s percent reflectivity in each of 
the red, green, and blue channels. A typical result would be (0.632, 0.647, 0.653). 
Often, these three numbers will be slightly different, indicating that the sphere tints 
the incident light. The light probe image can be corrected to match the background 
by dividing its channels by each of these numbers. 


FIGURE 9.10 The reflectivity of a mirrored sphere can be determined by placing the sphere 
near an easy-to-identify part of a diffuse surface, and comparing its brightness seen directly to its 
appearance as reflected in the sphere. In this case, the sphere is 59% reflective. 


386 CHAPTER O9. IMAGE-BASED LIGHTING 


Nonspecular Reflectance Mirrored spheres usually exhibit a faint diffuse or 
rough specular component due to microscopic scratches and deposits on their sur- 
face. It is best to keep the spheres in cloth bags to minimize such scratching, as well 
as to keep them dry so that the surface does not oxidize. A slight rough specular 
component usually makes little difference in how light probe images illuminate CG 
objects, but when viewed directly the reflected image may lack contrast in dark re- 
gions and exhibit bright flares around the light sources. If an application requires a 
near-perfectly shiny mirrored surface, one can have a glass or metal sphere or hemi- 
sphere specially coated with a thin layer of aluminum by an optical coating company 
(only half a sphere can be photographed at once, and thus in practice a hemisphere 
can also be used as a light probe). Such coated optics yield extremely clear specular 
reflections and can be up to 91% reflective. Some experiments in capturing the sky 
using such optics can be found in Stumpfel [199]. 


Polarized Reflectance Mirrored spheres behave somewhat unexpectedly with re- 
spect to polarization. Light that reflects from a sphere at angles next to the outer 
rim becomes polarized, an effect characterized by Fresnel’s equations [168]. Cam- 
era sensors generally record light irrespective of polarization, so this itself is not a 
problem.? However, for the same reasons, polarized light reflecting from a mirrored 
sphere can appear either too bright or too dim compared to being viewed directly. 
This is a significant effect in outdoor environments, where the scattered blue light 
of the sky is significantly polarized. This problem can be substantially avoided by 
using highly reflective mirror coatings (as discussed previously). 


Image Resolution It can be difficult to obtain a particularly high-resolution image 
of an environment as reflected in a mirrored sphere, because only one image is used 
to cover a fully spherical field of view. For lighting CG objects, the need for highly 
detailed light probe images is not great: only large and very shiny objects reflect light 
in a way that fine detail in an environment can be noticed. However, for forming 
virtual backgrounds behind CG objects it is often desirable to have higher-resolution 
imagery. In the RNL animation, the low resolution of the light probe image used 


2 Whereas sensors tend to detect different polarization directions equally, wide-angle lenses can respond differently ac- 
cording to polarization for regions away from the center of the image. 
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for the background produced the reasonably natural appearance of a shallow depth 
of field, even though no depth of field effects were simulated. 

Photographing a mirrored sphere is an example of a catadioptric imaging system in 
that it involves both a lens and a reflective surface. In addition to mirrored spheres, 
other shapes can be used that yield different characteristics of resolution, depth of 
field, and field of view. One example of a well-engineered omnidirectional video 
camera covering slightly over a hemispherical field of view is presented in Nayer 
[188]. Nonetheless, for capturing illumination wherein capturing the full sphere is 
more important than high image resolution, mirrored spheres are often the most 
easily available and convenient method. 


9.3.2 TILED PHOTOGRAPHS 


Omnidirectional images can also be captured by taking a multitude of photographs 
looking in different directions and “stitching” them together — a process made fa- 
miliar by QuickTime VR panoramas [156]. This technique can be used to assemble 
remarkably high-resolution omnidirectional images using a standard camera and 
lens. Unfortunately, the most commonly acquired panoramas see all the way around 
the horizon but only with a limited vertical field of view. For capturing lighting, it is 
important to capture imagery looking in all directions, particularly upward, because 
this is often where much of the light comes from. Images taken to form an omnidi- 
rectional image will align much better if the camera is mounted on a nodal rotation 
bracket, which can eliminate viewpoint parallax between the various views of the 
scene. Such brackets are available commercially from companies such as Kaidan 
(www.kaidan.com). Some models allow the camera to rotate around its nodal center 
for both the horizontal and vertical axes. 

Figure 9.11 shows tone-mapped versions of the source HDR images for the 
“Uffizi Gallery” light probe image [160], which was acquired as an HDR tiled 
panorama. These images were aligned by marking pairs of corresponding points 
between the original images and then solving for the best 3D rotation of each image 
to minimize the distance between the marked points. The images were then blended 
across their edges to produce the final full-view latitude-longitude mapping. This 
image was used as the virtual set and lighting environment for the middle sequence 
of the animation Fiat Lux. 
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FIGURE 5.44 (a) The Uffizi light probe was created from an HDR panorama with two rows 
of nine HDR images. (b) The assembled light probe in latitude-longitude format. (c) Synthetic 
objects added to the scene, using the light probe as both the virtual background and the lighting 


environment. 
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A more automatic algorithm for assembling such full-view panoramas is de- 
scribed in [201], and commercial image-stitching products such as QuickTime VR 
(www.apple.com/quicktime/qtvr/) and Realviz Stitcher (www.realviz.com) allow one to in- 
teractively align images taken in different directions to produce full-view panora- 
mas in various image formats. Unfortunately, at the time of writing no commercial 
products natively support stitching HDR images. Digital photographer Greg Down- 
ing (www.greqdowning.com) has described a process [166] for stitching each set of 
equivalent exposures across the set of HDR images into its own panorama, and then 
assembling this series of LDR panoramas into a complete light probe image. The 
key is to apply the same alignment to every one of the exposure sets: if each set 
were aligned separately, there would be little chance of the final stitched panoramas 
aligning well enough to be assembled into an HDR image. To solve this problem, 
Downing uses Realviz Stitcher to align the exposure level set with the most image 
detail and saves the alignment parameters in a way that they can be applied identi- 
cally to each exposure level across the set of views. These differently exposed LDR 
panoramas can then be properly assembled into an HDR panorama. 


9.3.3 FISH-EYE LENSES 


Fish-eye lenses are available for most single-lens reflex cameras and are capable of 
capturing 180 degrees or more of an environment in a single view. As a result, they 
can cover the full view of an environment in as few as two images. In Greene [171], 
a fish-eye photograph of the sky was used to create the upper half of a cube map 
used as an environment map. Although fish-eye lens images are typically not as sharp 
as regular photographs, light probe images obtained using fish-eye lenses are usu- 
ally of higher resolution than those obtained by photographing mirrored spheres. 
A challenge in using fish-eye lenses is that not all 35-mm digital cameras capture the 
full field of view of a 35-mm film camera due to having a smaller image sensor. In 
this case, the top and bottom of the circular fish-eye image are usually cropped off: 
This can require taking additional views of the scene to cover the full environment. 
Fortunately, recently available digital cameras, such as the Canon EOS 1Ds and the 
Kodak DCS 14n, have image sensor chips that are the same size as 35-mm film (and 
no such cropping occurs). 

Fish-eye lenses can exhibit particularly significant radial intensity fall-off, also 
known as vignetting. As with other lenses, the amount of falloff tends to increase with 
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the size of the aperture being used. For the Sigma 8-mm fish-eye lens, the amount 
of falloff from the center to the corners of the image is more than 50% at its widest 
aperture of f/4. The falloff curves can be calibrated by taking a series of photographs 
of a constant intensity source at different camera rotations and fitting a function to 
the observed image brightness data. This surface can be used to render an image of 
the flat-field response of the camera. With this image, any HDR image obtained with 
the camera can be made radiometrically consistent across its pixels by dividing by 
the image of the flat-field response. An example of this process is described in more 
detail in [200]. 


9.3.4 SCANNING PANORAMIC CAMERAS 


Scanning panoramic cameras are capable of capturing particularly high-resolution 
omnidirectional HDR images. These cameras use narrow image sensors that are typ- 
ically 3 pixels wide and several thousand pixels tall. The three columns of pixels are 
filtered by red, green, and blue filters, allowing the camera to sense color. A precise 
motor rotates the camera by 360 degrees over the course of a few seconds to a few 
minutes, capturing a vertical column of the panoramic image many times per sec- 
ond. When a fish-eye lens is used, the full 180-degree vertical field of view can be 
captured from straight up to straight down. Two cameras based on this process are 
made by Panoscan and Spheron VR (Figure 9.12). 

Trilinear image sensors, having far fewer pixels than area sensors, can be de- 
signed with more attention given to capturing a wide dynamic range in each expo- 
sure. Nonetheless, for IBL applications in which it is important to capture the full 
range of light, including direct views of concentrated light sources, taking multiple 
exposures is usually still necessary. The Panoscan camera’s motor is able to precisely 
rewind and repeat its rotation, allowing multiple exposures to be taken and assem- 
bled into an HDR image without difficulty. The Spheron VR camera (see also Section 
4.9.4) can be ordered with a special HDR feature in which the image sensor rapidly 
captures an HDR series of exposures for each column of pixels as the camera head 
rotates. These differently exposed readings of each pixel column can be assembled 
into HDR images from just one rotation of the camera. 

For these cameras, the speed of scanning is limited by the amount of light in 
the scene. Suppose that at the chosen f-stop and ISO setting it takes T3 of a sec- 
ond to obtain a proper exposure of the shadows and midtones of a scene. If the 
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(b) 


(a) The Spheron scanning panoramic camera. (b) A tone-mapped version of a 
high-resolution omnidirectional HDR image taken with the camera. Panorama courtesy of Ted 
Chavalas of Panoscan, Inc. 


image being acquired is 12,000 pixels wide, the camera must take at least 96 sec- 
onds to scan the full panorama. Fortunately, shooting the rest of the HDR image 
series to capture highlights and light sources takes considerably less time because 
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each shutter speed is shorter, typically at most one-fourth the length for each addi- 
tional exposure. Although capturing lighting with scanning panoramic cameras is 
not instantaneous, the resulting lighting environments can have extremely detailed 
resolution. The high resolution also enables using these images as background plates 
or to create image-based models for 3D virtual sets. 


9.3.5 CAPTURING ENVIRONMENTS WITH VERY 
BRIGHT SOURCES 


For image-based lighting, it is important that the acquired light probe images cover 
the full dynamic range of light in the scene up to and including light sources. If 
99.99% of a light probe image is recorded properly but 0.01% of the pixel values 
are saturated, the light captured could still be very inaccurate depending on how 
bright the saturated pixels really should have been. Concentrated light sources are 
often significantly brighter than the average colors within a scene. In a room lit by 
a bare light bulb, the light seen reflecting from tens of square meters of ceiling, 
floor, and walls originates from just a few square millimeters of light bulb filament. 
Because of such ratios, light sources are often thousands, and occasionally millions, 
of times brighter than the rest of the scene. 

In many cases, the full dynamic range of scenes with directly visible light sources 
can still be recovered using standard HDR photography techniques, in that camera 
shutter speeds can usually be varied down to T of a second or shorter, and small 
apertures can be used as well. Furthermore, modern lighting design usually avoids 
having extremely concentrated lights (such as bare filaments) in a scene, preferring 
to use globes and diffusers to more comfortably spread the illumination over a wider 
area. However, for outdoor scenes the sun is a light source that is both very bright 
and very concentrated. When the sun is out, its brightness can rarely be recorded 
using a typical camera even using the shortest shutter speed, the smallest aperture, 
and the lowest sensor gain settings. The sun’s brightness often exceeds that of the 
sky and clouds by a factor of fifty thousand, which is difficult to cover using varying 
shutter speeds alone. 

Stumpfel et al. [200] presented a technique for capturing light from the sky up 
to and including the sun. To image the sky, the authors used a Canon FOS 1Ds 
digital camera with a Sigma 8-mm fish-eye lens facing upward on the roof of an 
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office building (Figure 9.13(a)). The lens glare caused by the sun was minor, which 
was verified by photographing a clear sky twice and blocking the sun in one of the 
images. For nearly the entire field of view, the pixel values in the sky were within a 
few percent points of each other in both images. 

As expected, the sun was far too bright to record even using the camera’s shortest 
shutter speed of z of a second at f/16, a relatively small aperture.* The authors 
thus placed a Kodak Wratten 3.0 neutral density (ND) filter on the back of the lens 
to uniformly reduce the light incident on the sensor by a factor of one thousand.4 
ND filters are often not perfectly neutral, giving images taken though them a sig- 
nificant color cast. The authors calibrated the transmission of the filter by taking 
HDR images of a scene with and without the filter, and divided the two images to 
determine the filter’s transmission in the red, green, and blue color channels. All 
images subsequently taken through the filter were scaled by the inverse of these 
transmission ratios. 

Having the 3.0 ND filter on the lens made it possible to image the sun at m of 
a second at f/ 16 without saturating the sensor (see Figure 9.13 (a)), but it made the 
sky and clouds require an undesirably long exposure time of 15 seconds. To solve 
this problem, the authors used a laptop computer to control the camera so that 
both the shutter speed and the aperture could be varied during each HDR image 


sequence. Thus, the series began at f/4 with exposures of 1, i and $ second and 


then switched to f/16 with exposures of k, rs , o» and anh of a second. Images 
from such a sequence are seen in Figures 9.13(b) through 9.13(h). For pre-sunrise 
and post-sunset images, the f/16 images were omitted and an additional exposure 
of 4 seconds at f/4 was added to capture the dimmer sky of dawn and dusk. 
Creating HDR images using images taken with different apertures is slightly more 
complicated than usual. Because different apertures yield different amounts of lens 
vignetting, each image needs to be corrected for its aperture’s flat-field response 
(see Section 9.3.3) before the HDR assembly takes place. In addition, whereas the 
actual exposure ratios of different camera shutter speeds typically follow the ex- 
pected geometric progression (35 second is usually precisely half the exposure of 


3 The authors observed unacceptably pronounced star patterns around the sun at the smaller apertures of f/22 and f/32 
due to diffraction of light from the blades of the iris. 


4 Because the fish-eye lens has such a wide-angle view, filters are placed on the back of the lens using a small mounting 
bracket rather than on the front. 
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(b) 1 sec., f/4 


(c) i sec., f/4 (d) 5 sec., f/4 


l (a) A computer-controlled camera with a fish-eye lens placed on a roof to capture 
HDR images of the sky. A color-corrected HDR image series (b through h) of a sunny sky taken 
using varying shutter speed, aperture, and a 3.0 neutral density filter. The inset of each frame 
shows a small area around the sun, with saturated pixels shown in pink. Only the least exposed 
image (h) accurately records the sun's intensity without sensor saturation. 


1 ORE. x 2 1 
zs second; yz second is usually precisely half the exposure of g second), aper- 
ture transmission ratios are less exact. In theory, images taken at f/4 should re- 
ceive sixteen times the exposure of images taken at f/16, but generally do not. 


3.3 CAPTURING LIGHT PROBE IMAGES 395 


(e) k sec., f/16 (f) I5 sec., f/16 
(g) Tao sec., f/16 (h) a sec., f/16 
(continued) 


To test this, the authors took images of a constant intensity light source at both 
f/4 and f/16 and compared the pixel value ratios at the center of the image, mea- 
suring a factor of 16.63 rather than 16, and compensated for this ratio accord- 
ingly. 

When the sun was obscured by clouds or was low in the sky, the images with 
the shortest exposure times were not required to cover the full dynamic range. 
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The authors discovered they could adaptively shoot the HDR image sequence by 
programming the laptop computer to analyze each image in the series after it was 
taken. If one of the images in the series was found to have no saturated pixels, no 
additional photographs were taken. 


Indirectly Deriving Sun Intensity from a Diffuse Sphere Rather than using 
specialized HDR photography to directly capture the sun’s intensity, we can estimate 
the sun’s intensity based on the appearance of a diffuse object within the scene. In 
particular, we can use a mirrored sphere to image the ground, sky, and clouds and 
a diffuse sphere to indirectly measure the intensity of the sun. Such an image pair 
is shown in Figure 9.14. 

The mirrored sphere contains accurate pixel values for the entire sky and sur- 
rounding environment, except for the region of sensor saturation near the sun. 
Because of this, the clipped light probe will not accurately illuminate a synthetic 
object as it would really appear in the real environment; it would be missing the 
light from the sun. We can quantify the missing light from the sun region by com- 
paring the real and synthetic diffuse sphere images. 

To perform this comparison, we need to adjust the images to account for each 
sphere’s reflectivity. For the diffuse sphere, a gray color can be preferable to white 
because it is less likely to saturate the image sensor when exposed according to the 
average light in the environment. The paint’s reflectivity can be measured by paint- 
ing a flat surface with the same paint and photographing this sample (with a radio- 
metrically calibrated camera) in the same lighting and orientation as a flat surface of 
known reflectance. Specialized reflectance standards satisfying this requirement are avail- 
able from optical supply companies. These reflect nearly 99% of the incident light — 
almost perfectly white. More economical reflectance standards are the neutral-toned 
squares of a Gretag—MacBeth ColorChecker chart (www.gretagmacbeth.com), whose re- 
flectivities are indicated on the back of the chart. Let us call the reflectivity of our 
standard (standard, the pixel color of the standard in our image Lstandard, and the pixel 
color of the paint sample used for the diffuse sphere Lpain. Then the reflectivity 
Ppaint Of the paint is simply: 


L paint 


Ppaint = Pstandard : 
L standard 
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(a) Mirrored sphere (b) Diffuse sphere 


(c) F clipped (d) Drea 


FIGURE 9.44 (a) A mirrored sphere photographed as a single LDR exposure with direct sun- 
light. The bright sun becomes a saturated region of pixels clipped to white. (b) A gray diffuse sphere 
photographed in the same illumination, held several inches in front of the mirrored sphere. (c) The 
cropped, reflectance-calibrated, and clipped mirrored sphere image Palipped- (d) The cropped and 
reflectance-calibrated image of the diffuse sphere Deal. 


For the diffuse paint used to paint the sphere in Figure 9.14(b), the reflectivity was 
measured to be (0.320, 0.333, 0.346) in the red, green, and blue channels, indicat- 
ing that the chosen paint is very slightly bluish. Dividing the pixel values of the dif- 
fuse sphere by its reflectivity yields the appearance of the diffuse sphere as if it were 
100% reflective, or perfectly white; we call this image Dya (see Figure 9.14(d)). 
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Likewise, the image of the mirrored sphere should be divided by the mirrored 
sphere’s reflectivity as described in Section 9.3.1 and Figure 9.10 to produce the 
image that would have been obtained from a perfectly reflective mirrored sphere. 
With both sphere images calibrated, we can use IBL to light a synthetic white 
sphere with the clipped lighting environment image, which we call D¢ipped in Fig- 
ure 9.15. As expected, this synthetic sphere appears darker than the real sphere in 
the actual lighting environment Dya. If we subtract Daippea from Drea, we obtain an 
image of the missing reflected light from the sun region (which we call Dyn). This 
operation leverages the additive property of light, described in detail in Section 9.9. 
From the previous section we know that the sun is a 0.53-degree disk in the sky. 
To properly add such a source to Puipped, it should be placed in the right direction 
and assigned the correct radiant intensity. The direction of the sun can usually be 
estimated from the clipped light probe image as the center of the saturated region 
in the image. The pixel coordinates (u, v) in this image can be converted to a di- 
rection vector (Dx, Dy, Dz) using the ideal sphere-mapping formula discussed in 
Section 9.4.1. If the saturated region is too large to locate the sun with sufficient 
precision, it is helpful to have at least one additional photograph of the mirrored 


Dyeal Dati pped Dsun 


The calibrated image of the real diffuse sphere Drea, minus the image of a 
bra diffuse sphere lit by the clipped IBL environment Daippea» yields an image of the missing 
light from the sun Doun. 
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sphere taken with a shorter exposure time, or a photograph of a black plastic sphere 
in which the sun’s position can be discerned more accurately. In this example, the 
sun’s direction vector was measured as (0.748, 0.199, 0.633) based on its position 
in the mirrored sphere image. 

To determine the sun’s radiance, we know that if the sun alone were to illu- 
minate a white diffuse sphere the sphere should appear similar to Dyn. We begin 
by creating a sun disk with direction (0.748, 0.199, 0.633), an angular extent of 
0.53 degrees diameter, and an experimental radiance value of Leuntest = 46,700. The 
specification for such a light source in a rendering file might look as follows. 


light sun directional { 
direction 0.748 0.199 0.633 
angle 0.5323 
color 46700 46700 46700 

} 


The value 46,700 is chosen for convenience to be the radiant intensity of a 0.53 
degrees diameter infinite light source that illuminates a diffuse white sphere such 
that its brightest spot (pointing toward the source) has a radiance of 1. Lighting a 
white sphere with this source produces the image Dente: seen in Figure 9.16. 

From the equation shown in Figure 9.16, we can solve for the unknown color 
a that best scales Duntes to match Dey. This is easily accomplished by dividing 
the average pixel values of the two images: œ = avg( Dom) /avg( Daunte). Then, we can 
compute the correct radiant intensity of the sun as Ly, = A Lsuntest- For this example, 
applying this procedure to each color channel yields œ = (1.166, 0.973, 0.701), 
which produces Ly, = (54500, 45400, 32700). Replacing the Leunte: value in the 
directional light specification file with this new Lyn value produces a directional 
light that models the missing sunlight in the clipped light probe image. 

We can validate the accuracy of this procedure by lighting a diffuse sphere with 
the combined illumination from the clipped probe Puipped and the reconstructed sun. 
Figures 9.17(a) and (b) show a comparison between the real diffuse sphere Deal 
and a synthetic diffuse sphere illuminated by the recovered environment. Subtract- 
ing (a) from (b) (shown in Figure 9.17(c)) allows us to visually and quantitatively 
verify the accuracy of the lighting reconstruction. The difference image is nearly 
black, which indicates a close match. In this case, the root mean squared intensity 
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Dsunte st Dsun 


Lighting a white sphere (Deuntest) from the direction of the sun with an ex- 
perimental radiant intensity Lsuntest sets up an equation to determine the missing light from the 
sun. 
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Dici b) Rendered c) Difference 


(a) The calibrated diffuse sphere Drea in the lighting environment. (b) A ren- 
dered diffuse sphere, illuminated by the incomplete probe Palipped and the recovered sun. (c) The 
difference between the two. The nearly black image indicates that the lighting environment was 
recovered accurately. 
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difference between the real and simulated spheres is less than 2%, indicating a close 
numeric match as well. The area of greatest error is in the lower left of the differ- 
ence image, which is a dim beige color instead of black due to unintended bounced 
light on the real sphere from the person’s hand holding the sphere. 

At this point, the complete lighting environment is still divided into two pieces: 
the clipped IBL environment and the directional sun source. To unify them, the sun 
disk could be rasterized into a mirror reflection image space and then added to the 
clipped IBL environment. However, as we will see in Section 9.6.1, it can be a great 
benefit to the rendering process if concentrated lights such as the sun are simu- 
lated as direct light sources rather than as part of the IBL environment (because of 
the sampling problem). Figure 9.38(e) later in this chapter shows a rendering of 
a collection of CG objects illuminated by the combination of the separate clipped 
probe and the direct sun source. The appearance of the objects is realistic and con- 
sistent with their environment. The rendering makes use of an additional technique 
(described in Section 9.7) to have the objects cast shadows onto the scene as well. 

Images with concentrated light sources such as the sun not only require addi- 
tional care to acquire but also pose computational challenges for image-based light- 
ing algorithms. Section 9.6 describes why these challenges occur and how they can 
be solved through importance sampling techniques. Before beginning that topic, 
we will describe some of the omnidirectional image-mapping formats commonly 
used to store light probe images. 


9.4 OMNIDIRECTIONAL IMAGE MAPPINGS 


Once a light probe image is captured, it needs to be stored in an image file us- 
ing an omnidirectional image mapping. This section describes four of the most 
commonly used image mappings and provides formulas to determine the appropri- 
ate (u, v) coordinates in the image corresponding to a unit direction in the world 
D=(D,, Dy, D), and vice versa. These formulas all assume a right-handed coor- 
dinate system in which (0, 0, —1) is forward, (1, 0, 0) is right, and (0, 1, 0) is up. 

This section also discusses some of the advantages and disadvantages of each 
format. These considerations include the complexity of the mapping equations, 
how much distortion the mapping introduces, and whether the mapping has special 
features that facilitate computing properties of the image or using it with specific 


402 CHAPTER O9. IMAGE-BASED LIGHTING 


rendering algorithms. The mappings this section presents are the ideal mirrored sphere, 
the angular map, latitude-longitude, and the cube map. 


9.4.1 IDEAL MIRRORED SPHERE 


For the ideal mirrored sphere we use a circle within the square image domain of 
u € [0, 1], v € [0, 1]. The mapping equation for world to image is as follows. 


sin(4 arccos(—D-)) 
2) D: + D, 


1 1 
wry = (}+rD, 5 rDy) 


The mapping equation for image to world is as follows. 


r = y (2u — 1)? + (v — 1)? 
(0, ġ) = (atan2(2u — 1, —2v + 1), 2arcsin(r)) 
(Dx, Dy, Dz) = (sing cos 6, sing sin 8, — cos p) 


The ideal mirrored sphere mapping (Figure 9.18(a)) is how the world looks when 
reflected in a mirrored sphere, assuming an orthographic camera and a world that is 
distant relative to the sphere’s diameter. In practice, real spheres exhibit a mapping 
similar to this ideal one as long as the sphere is small relative to its distance to the 
camera and to the objects in the environment. Of course, the reflection in a sphere 
is actually a mirror image of the environment, and thus the image should be flipped 
in order to be consistent with directions in the world. 

Like all mappings in this section, the ideal mirrored sphere reflects all directions 
of the environment. Straight forward — that is, D = (0,0, —1)— appears in the 


center of the image. Straight right, D = (1, 0, 0), appears a2 of the way from the 


center to the right-hand edge of the image. Straight up appears v2 of the way 


from the center to the top of the image. Straight back is the one direction that 
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(a) The ideal mirrored sphere mapping. (b) the angular map mapping. The 


mappings appear similar, but devote different amounts of image area to the front versus the back 
half of the environment. 


does not cleanly map to a particular image coordinate, and corresponds to the outer 
circumference of the circle’s edge. 

From these coordinates, we see that the front half of the environment is con- 
tained within a disk that is {y of the diameter of the full image. This makes the 
area taken up by the front half of the environment precisely equal to the area taken 
up by the back half of the environment. This property generalizes in that any two 
regions of equal solid angle in the scene will map to the same amount of area in 
the image. One use of this equal-area property is to calculate the average illumina- 
tion color in a light probe (the average pixel value within the image circle is the 
average value in the environment). If the image has a black background behind 
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the circular image, the average value in the scene is 4 of the average of the entire 
square. 

A significant disadvantage with the mirrored sphere mapping is that the back half 
of the environment becomes significantly stretched in one direction and squeezed 
in the other. This problem increases in significance toward the outer edge of the 
circle, becoming extreme at the edge. This can lead to the same problem we saw 
with mirrored sphere light probe images, in that the regions around the edge are 
poorly sampled in the radial direction. Because of this, the mirrored sphere format 
is not a preferred format for storing omnidirectional images, and the angular map 
format is frequently used instead. 


9.4.2 ANGULAR MAP 


For the angular map we also use a circle within the square image domain of u € 
[0, 1], v € [0, 1]. The mapping equation for world to image is as follows. 


arccos(— D;) 
2r] D: + Dy? 


1 1 
(u,v) = G -=r Dy, 5 + rs) 


r= 


The equation for image to world is as follows. 


(0,ġ)= (atan2(—2u +1,2u—1),¢= my (2u — 12+ Qu- D?) 
(Dx, Dy, Dz) = (sing cos 0, sing sin0, — cos d) 


The angular map format (Figure 9.18(b)) is similar in appearance to the mir- 
rored sphere mapping, but it samples the directions in a manner that avoids un- 
dersampling the regions around the edges. In this mapping, the distance of a point 
from the center of the image is directly proportional to the angle between straight 
ahead and its direction in the world. In this way, straight forward appears at the 
center of the image, and straight right and straight up appear halfway to the edge of 
the image. Regions that map near the edge of the sphere are sampled with at least 
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as many pixels per degree in any direction as the center of the image. Because of 
this property, many light probe images (including those in the Light Probe Image 
Gallery [160]) are available in this format. Unlike the mirrored sphere mapping, 
the angular map is not equal area, and does not translate solid angle proportion- 
ately into image area. Areas near the edge become stretched in the direction tangent 
to the circumference but are neither stretched nor squeezed in the perpendicular 
direction, making them overrepresented in the mapping. 

Because the mirrored sphere and angular map mappings appear similar, some- 
times an angular map image is loaded into a rendering program as if it were a 
mirrored sphere image, and vice versa. The result is that the environment becomes 
distorted, and straight vertical lines appear curved. One way to tell which mapping 
such an image is in is to convert it to the latitude-longitude format (in which ver- 
tical lines should be straight) or the cube map format, in which all straight lines 
should be straight except at face boundaries. 


9.4.3 LATITUDE-LONGITUDE 


For the latitude-longitude mapping we use a rectangular image domain of u € 
[0, 2], v € [0, 1]. The mapping equation for world to image is as follows. 


1 1 
(u,v) = (1 + — atan 2(Dx, — Dz), — arccos Ds) 
T E 


The equation for image to world is as follows. 


(6,6) = (x(u — 1), mv) 
(Dx, Dy, Dz) = (sing sin, cos ¢, — sing cos 0) 


The latitude-longitude mapping (Figure 9.19(a)) maps a direction’s azimuth to the 
horizontal coordinate and its elevation to the vertical coordinate of the image. This 
is known to cartographers as an equirectangular mapping. Unlike the previous two map- 
pings, it flattens the sphere into a rectangular area. The top edge of the image cor- 
responds to straight up, and the bottom edge of the image is straight down. The 
format most naturally has a 2:1 aspect ratio (360 degrees by 180 degrees), as this in- 
troduces the least distortion for regions near the horizon. The areas toward straight 
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up and straight down are sampled equivalently with the regions near the horizon in 
the vertical direction, and are progressively more oversampled toward the top and 
bottom edges. 

The latitude-longitude mapping is convenient because it is rectangular and has 
no seams (other than at the poles), and because the mapping formulas are simple 
and intuitive. Although straight lines generally become curved in this format, ver- 
tical lines in the scene map to straight vertical lines in the mapping. Another useful 
property is that in this format the lighting environment can be rotated around the y 
axis simply by translating the image horizontally. The mapping is not equal area (the 
percentage any particular area is overrepresented is inversely proportional to the co- 
sine of the latitude @). Thus, to find the average pixel value in a light probe image in 
this format one can multiply the image by the vertical cosine falloff function cos @ 
and compute the average value of these modified pixels. 


9.4.4 CUBE MAP 


For the cube map we use a rectangular image domain of u € [0, 3], v € [0, 4]. The 
cube map formulas require branching to determine which face of the cube corre- 
sponds to each direction vector or image coordinate. Thus, they are presented as 
pseudocode. The code for world to image is as follows. 


if ((Dz<0) && (Dz<=-abs(Dx)) 
&& (Dz<=-abs(Dy))) // forward 
u 1.5 = 0.5 * Dx 7 Dz: 
v=1.5+ 0.5 * Dy / Dz; 
else if ((Dz>=0) && (Dz>=abs (Dx) 
&& (Dz>=abs(Dy))) // backward 
u 1.5 + 0.5 * Dx / Dz; 
v =3.5+ 0.5 * Dy / Dz; 
else if ((Dy<=0) && (Dy<=-abs(Dx)) 
&& (Dy<=-abs(Dz)) // down 
u=1.5 - 0.5 * Dx / Dy; 
v=2.5 - 0.5 * Dz / Dy; 
else if ((Dy>=0) && (Dy>=abs (Dx) 
&& (Dy>=abs(Dz))) // up 
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u l5 + 0:5-4. DX- A Dy; 
v=0.5 - 0.5 * Dz / Dy; 
else if ((Dx<=0) && (Dx<=-abs(Dy)) 
&& (Dx<=-abs(Dz))) // left 
u 0.5 + 0.5 * Dz / Dx; 
v=1.5 + 0.5 * Dy / Dx; 
else if ((Dx>=0) && (Dx>=abs (Dy) ) 
&& (Dx>=abs(Dz))) // right 
u= 2.5 + 0.5 * Dz / Dx; 
v=1.5 = 0.5 * Dy / Dx; 
The code for image to world is as follows. 
if u>=1 and u<2 and v<l // up 
Vx = (u - 1.5) * 2 
Vy = 1.0 
Vz = (v - 0.5) * -2 
else if u<l and v>=1 and v<2 // left 
Vx = -1.0 
Vy = (v - 1.5) * -2 
Vz = (u - 0.5) * -2 
else if u>=l and u<2 and v>=1 and v<2 // forward 
Vx = (ù = 1.5) * 2 
Vy = (y - 1.5) * -2 
Vz alles 
else if u>=2 and v>=1 and v<2 // right 
Vx = 1.0 
Vy = (v - 1.5) * -2 
Vz = (ùu - 2.5) * 2 
else if u>=l and u<2 and v>=2 and v<3 // down 
Vx u = 15) 2 
Vy = -1.0 
Vz = (v - 2.5) * 2 
else if u>=1 and u<2 and v>=3 // backward 
Vx = (u - 1.5) * 2 
Vy = (v - 3.5) * 2 
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normalize = 1 / sqrt(Vx * Vx + Vy * Vy + Vz * Vz) 


Dx = normalize * Vx 
Dy normalize * Vy 
Dz normalize * Vz 


In the cube map format (Figure 9.19(b)) the scene is represented as six square 
perspective views, each with a 90-degree field of view, which is equivalent to pro- 
jecting the environment onto a cube and then unfolding it. The six squares are most 
naturally unfolded into a horizontal or vertical cross shape but can also be packed 
intoa 3 x 2 or 6 x 1 rectangle to conserve image space. The mapping is not equal 
area (areas in the corners of the cube faces take up significantly more image area per 
solid angle than the areas in the center of each face). However, unlike the angular 
map and latitude-longitude mappings, this relative stretching is bounded: angular 
areas that map to the cube’s corners are overrepresented by a factor of up to 3/3 in 
area relative to regions in the center. 

This mapping requires six different formulas to convert between world directions 
and image coordinates, depending on which face of the cube the pixel falls within. 
Although the equations include branching, they can be more efficient to evaluate 
than the other mappings (which involve transcendental functions such as asin and 
atan2). This image format is sometimes the most convenient for editing the light 
probe image, because straight lines in the environment remain straight in the image 
(although there are generally directional discontinuities at face boundaries). For 
showing a light probe image in the background of a real-time rendering application, 
the image mapping is straightforward to texture map onto a surrounding cubical 
surface. 


9.5 HOW A GLOBAL ILLUMINATION RENDERER 
COMPUTES IBL IMAGES 


Returning to the rendering process, it is instructive to look at how a global illumina- 
tion algorithm computes IBL images such as those seen in Figure 9.4. In general, the 
algorithm needs to estimate how much light arrives from the lighting environment 
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and the rest of the scene at each surface point, which in large part is a matter of vis- 
ibility: light from the visible parts of the environment must be summed, and light 
that is blocked by other parts of the scene must instead be replaced by an estimate 
of the light reflecting from those surfaces. This measure of incident illumination, 
multiplied by the reflectance of the surface itself, becomes the color rendered at a 
particular pixel in the image. 

The RNL animation was rendered with RADIANCE [205], which like most mod- 
ern global illumination systems is based on ray tracing. The image is rendered one 
pixel at a time, and for each pixel the renderer needs to determine the RGB color 
L to display for that pixel. In our case, L is an HDR RGB pixel value, with its three 
components proportional to the amount of red, green, and blue radiance arriving 
toward the camera in the direction corresponding to the pixel. For each pixel, a ray 
R is traced from the camera C (Figure 9.20(a)) until it hits a surface in the scene 
at a 3D point P. L is then computed as a function of the reflectance properties of 
the surface at P and the incident light arriving at P. This section lists the different 
types of surfaces R can hit and how the rendering system then computes their ap- 
pearance as lit by the scene. We will see that the most costly part of the process is 
computing how light reflects from diffuse surfaces. Later in this chapter, we will see 
how understanding this rendering process can motivate techniques for increasing 
the rendering efficiency. 


Case 1: R Hits the IBL Environment If the ray strikes the emissive surface sur- 
rounding the scene at point P (Figure 9.20(a)), the pixel color L in the rendering 
is computed as the color from the light probe image that was texture mapped onto 
the surface at P. In RNL, this could be a green color from the leaves of the trees, a 
bright blue color from the sky, or a brown color from the ground below, depending 
on which pixel and where the camera is looking. The HDR range of the surrounding 
environment is transferred to the resulting HDR renderings. 

In this case, the renderer does not take into consideration how the surface at 
point P is being illuminated, because the surface is specified to be emissive rather 
than reflective. This is not exactly how light behaves in the real world, in that most 
objects placed in a scene have at least a minute effect on the light arriving at all 
other surfaces in the scene. In RNL, however, the only place where this might be 
noticeable is on the ground close to the pedestal, which might receive less light 
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FIGURE 9.20 Probe mapping: A mirrored sphere reflects the entire surrounding environment 


except for the light blue area obscured by the ball. The ideal mirrored sphere mapping (see Sec- 


tion 9.4.1) maps a light probe image onto an IBL environment surface. Case 1: A ray R sent 
from the virtual camera C hits the IBL environment surface at P. The HDR value L on the 
surface is copied to the image pixel. Case 2: R hits a specular surface of a CG object. The reflected 
ray R’ is traced, in this case striking the IBL environment at P’ with an HDR value of L’. L’ 
is multiplied by the object’s specular color to determine the final pixel value L. (For a translucent 
surface, a refracted ray R” is also traced.) Case 3: R hits a diffuse surface at P. A multitude of 
rays R; are traced into the scene to determine the irradiance E at P as a weighted average of the 


incident light values L’ from points P’. E is multiplied by the diffuse object color to determine 


the final pixel value L. 
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from the environment due to shadowing from the pedestal. Because this area was 
not seen in the RNL animation, the effect did not need to be simulated. 


Case 2: R Hits a Mirror-like Specular Surface A mirror-like specular surface 
having no roughness shows a clear image of the scene around it. In RNL, the glass, 
plastic, and metallic spheres have mirror-like specular components. When the ray 
R hits such a surface at a point P (Figure 9.20(b)), the renderer reflects the ray 
about P’s surface normal N and follows this reflected ray R’ into the scene until it 
strikes a new surface at point P’. It then recursively computes the color L’ of the 
light coming from P’ along R’ toward P in precisely the same way it computes 
the light coming from a point P along R toward the camera. For example, if R’ 
strikes the emissive surface surrounding the scene at P’, the renderer retrieves the 
appropriate pixel color L’ from the surrounding light probe image (an instance of 
Case 1). The recursion depth for computing bounces of light is usually limited to a 
user-defined number, such as six bounces for specular reflections and two bounces 
for diffuse reflections. 

The incident light L’ along the reflected ray is then used to produce the specular 
component of the resulting light L reflecting from the surface. For metals, this 
light is multiplied by the metallic color of the surface. For example, a gold material 
might have a metallic color of (0.8, 0.6, 0.3). Because these color components are 
less than 1, the metallic reflection will reveal some of the detail in the reflected HDR 
environment not seen directly. 

For glass-like materials, the mirror-like specular component is fainter, typically 
in the range of 4 to 8% (though at grazing angles it becomes greater due to Fresnel 
reflection), depending on the index of refraction of the material. Thus, the light L’ 
is multiplied by a small value to create the specular component of the color L in the 
image. In the case of RNL, this makes the bright sky detail particularly evident in the 
reflections seen in the top of the large glass ball (as in Figure 9.4(a)). For glass-like 
materials, a second refracted ray R” is also traced through the translucent surface, 
and the light L” arriving along this ray is added to the total light L reflected toward 
the camera. 

For plastic materials, there is both a specular component and a diffuse compo- 
nent to the reflection. For such materials the specular component is computed in 
the same way as the specular component of glass-like materials and is added to the 
diffuse component, which is computed as described in material following. 
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Case 3: R Hits a Diffuse Surface Diffuse surfaces reflect light equally in all di- 
rections, and light arriving from every direction in the upper hemisphere of the 
surface contributes to the reflected light. Because of this, computing the light re- 
flecting from diffuse surfaces can be computationally expensive. The total amount 
of light arriving at a surface point P is called its irradiance, denoted by Æ, which is 
a weighted integral of all colors L/ arriving along all rays R! (Figure 9.20(c)). The 
contribution of each ray R! is weighted by the cosine of the angle 6; between it and 
the surface normal N, because light arriving from oblique angles provides less illu- 
mination per unit area. If we denote L'(P,@) to be the function representing the 
incident light arriving at P from the angular direction w in P’s upper hemisphere 
Q, E can be written as 


E(P,N) s| L'(P, œw) cosb do. 
2 


Unfortunately, E cannot be computed analytically, because it is dependent on not 
just the point’s view of the environment but also on how this light is occluded and 
reflected by all other surfaces in the scene visible to the point. To estimate E, the 
renderer takes a weighted average of the light colors L/ arriving from a multitude of 
rays R; sent out from P to sample the incident illumination. When a ray R; strikes 
the surface surrounding the scene, it adds the corresponding pixel color L’ from the 
lighting environment to the sum, weighted by the cosine of the angle between N 
and R;. When a ray R! strikes another part of the scene P/, the renderer recursively 
computes the color of light L; reflected from P’ toward P and adds this to the sum 
as well, again weighted by cos@. Finally, E is estimated as this sum divided by the 
total number of rays sent out, as follows. 


The accuracy of the estimate of E increases with the number of rays k sent out from 
P. Once E is computed, the final light L drawn for the pixel is the surface’s diffuse 
color (called its albedo, often denoted by p) multiplied by the irradiance £. 
Computing the integral of the incident illumination on an object’s surface per- 
forms a blurring process on the HDR light probe image that is similar to the blurring 
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we have seen in Section 9.2.5. HDR pixels in the image are averaged together ac- 
cording to a filter, in this case a blur over the upper hemisphere. Just as before, this 
process makes the effect of HDR pixel values in an environment visible in the much 
lower dynamic range values reflecting from diffuse surfaces. Clipping the lighting 
environment to a display’s white point before computing the lighting integral would 
significantly change the values of the computed irradiance. 

Because many rays must be traced, the process of sampling the light arriving 
from the rest of the scene at a point can be computationally intensive. When rays 
are sent out at random, the number of rays needed to estimate the incident illumina- 
tion to a given expected accuracy is proportional to the variance of the illumination 
within the scene. To conserve computation, some renderers (such as RADIANCE) 
can compute €E for a subset of the points P in the scene, and for other points in- 
terpolate the irradiance from nearby samples — a process known as irradiance caching. 
Furthermore, irradiance samples computed for one frame of an animation can be 
reused to render subsequent frames, as long as the scene remains static. Both of 
these features were used to reduce the rendering time for the RNL animation by a 
factor of several thousand. 


9.5.2 SAMPLING OTHER SURFACE REFLECTANCE 
TYPES 


In RNL, all of the surfaces had either a mirror-like specular component or com- 
pletely diffuse reflectance, or a combination of the two. Other common mate- 
rial types include rough specular reflections and more general bidirectional re- 
flectance distribution functions (BRDFs) [190] that exhibit behaviors such as 
retroreflection and anisotropy. For such surfaces, the incident illumination needs 
to be sampled according to these more general distributions. For example, in the 
case of rough specular surfaces the rendering system needs to send out a multi- 
tude of rays in the general direction of the reflected angle R’ with a distribution 
whose spread varies with the specular roughness parameter. Several BRDF mod- 
els (such as Lafortune et al. [178] and Ashikhmin and Shirley [153]) have as- 
sociated sampling algorithms that generate reflected ray directions in a distrib- 
ution that matches the relative contribution of incident light directions for any 
particular viewing direction. Having such importance sampling functions is very help- 
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ful for computing renderings efficiently, as the alternative is to send out signifi- 
cantly more rays with a uniform distribution and weight the incident light arriv- 
ing along each ray according to the BRDF. When the number of rays is limited, 
this can lead to noisy renderings in comparison to sampling in a distribution that 
matches the BRDF. Figure 9.21 shows an example of this difference from Lawrence 
et al. [181]. 

In general, using more samples produces higher-quality renderings with more 
accurate lighting and less visible noise. We have just seen that for different types of 
materials renderings are created most efficiently when samples are chosen accord- 
ing to a distribution that matches the relative importance of each ray to the final 
appearance of each pixel. Not surprisingly, the importance of different ray direc- 


So 


(a) (b) 


IGURE £ IBL renderings of a pot from [181] computed with 75 rays per pixel to sample 
the incident illumination for a vase with a rough specular BRDE. (a) Noisy results obtained by 
using uniform sampling and modulating by the BRDE (b) A result with less noise obtained by 
sampling with a distribution based on a factored representation of the BRDE. 
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tions depends on not only the BRDF of the material but on the distribution of the 
incident illumination in the environment. To render images as efficiently as possible, 
it becomes important to account for the distribution of the incident illumination 
in the sampling process, particularly for IBL environments with bright concentrated 
light sources. The next section presents this problem and describes solutions for 
sampling incident illumination efficiently. 


9.6 SAMPLING INCIDENT ILLUMINATION 
EFFICIENTLY 


We have just seen that the speed at which images can be computed using IBL as 
described in the previous section depends on the number of rays that need to be 
traced. The bulk of these rays are traced to estimate the illumination falling upon 
diffuse surfaces through sampling. The light falling on a surface (i.e., its irradiance 
E) is the average value of the radiance arriving along all light rays striking the 
surface, weighted by the cosine of the angle each ray makes with the surface normal. 
Because averaging together the light from every possible ray is impractical, global 
illumination algorithms estimate the average color using just a finite sampling of 
rays. This works very well when the lighting environment is generally uniform in 
color. In this case the average radiance from any small set of rays will be close to 
the average from all of the rays because no particular ray strays far from the average 
value to begin with. However, when lighting environments have some directions 
that are much brighter than the average color, it is possible for the average of just a 
sampling of the rays to differ greatly from the true average. 

When ray directions are chosen at random, the number of rays needed to ac- 
curately sample the illumination is proportional to the variance in the light probe 
image. Lighting environments that have low variance (such as cloudy skies) can be 
sampled accurately with just tens of rays per irradiance calculation. Environments 
with greater variance, such as scenes with concentrated light sources, can require 
hundreds or thousands of rays per irradiance calculation when rays are sent out at 
random. The Eucalyptus Grove light probe, which featured bright backlit clouds in 
the direction of the setting sun, required over 1,000 rays per irradiance calculation, 
making rendering RNL computationally expensive. In this section, we describe this 
sampling problem in detail and present several sampling techniques that mitigate 
the difficulties. 
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FIGURE 9.22 A laser-scanned 3D model of the Parthenon illuminated by a uniformly white 


lighting environment, giving the appearance of a dense cloudy day. 


Figure 9.22 shows a virtual model of the Parthenon rendered with IBL using 
a synthetic lighting environment of a completely white sky.” This environment, 
being all the same color, has zero variance, and a high-quality rendering could be 
computed using the Arnold global illumination rendering system [167] using a 
relatively modest 81 rays to sample the incident light at each pixel. The only source 
of illumination variance at each surface point is due to visibility: some directions 
see out to the sky, whereas others see indirect light from the other surfaces in the 
scene. Because the color of the indirect light from these other surfaces is also low 
in variance and not radically different from the light arriving from the sky, the 


5 The virtual Parthenon model was created by laser scanning the monument and using an inverse global illumination 
process leveraging image-based lighting to solve for its surface colors and reflectance properties from photographs, as 
described in Debevec et al. [163]. 
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FIGURE 9.25 A light probe image of a sunny sky, taken with a fish-eye lens. Pixels on the 


sun disk (much smaller than the saturated region seen here) are on the order of 100,000 times 
brighter than the average color of the rest of the sky. 


total variance remains modest and the illumination is accurately sampled using a 
relatively small number of rays. 

In contrast to the perfectly even sky, Figure 9.23 shows a clear sky with a directly 
visible sun, acquired using the technique in Stumpfel et al. [200]. The half-degree 
disk of the sun contains over half the sky’s illumination but takes up only one hun- 
dred thousandth of the sky’s area, giving the sunny sky a very high variance. If we 
use such a high-variance environment to illuminate the Parthenon model, we ob- 
tain the noisy rendering seen in Figure 9.24. For most of the pixels, none of the 
81 rays sent out to sample the illumination hit the small disk of the sun, and thus 
they appear only to be lit by the light from the sky and clouds. For the pixels where 
one of the rays did hit the sun, the sun’s contribution is greatly overestimated, be- 
cause the sun is counted as A of the lighting environment rather than one hundred 
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FICURE 9.24 The virtual Parthenon rendered using the sunny sky light probe in Figure 9.23 
with 81 lighting samples per pixel. The appearance is speckled because 81 samples distributed 
randomly throughout the sky are not enough to accurately sample the small disk of the sun. 


thousandth. These pixels are far brighter than what can be shown in an LDR image, 
and their intensity is better revealed by blurring the image somewhat (as in Fig- 
ure 9.25). On average, this image shows the correct lighting, but the appearance is 
strange because the light from the sun has been squeezed into a random scattering 
of overly bright pixels. 

The noisy Parthenon rendering shows what is known as the sampling problem. For 
lighting environments with concentrated sources, a very large number of rays needs 
to be sent in order to avoid noisy renderings. For the sunny sky, hundreds of thou- 
sands of rays would need to be sent to reliably sample the small sun and the large 
sky, which is computationally impractical. Fortunately, the number of rays can be 
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FIGURE 9.25 A Gaussian blurred version of Figure 9.24, showing more clearly the amount of 


reflected sun energy contained in just a few of the image pixels. 


reduced to a manageable number using several techniques. In each technique, the 
key idea is to give the rendering algorithm a priori information about how to sample 
the illumination environment efficiently. These techniques (described in material 
following) include light source identification, light source constellations, and importance sampling. 


9.6.1 IDENTIFYING LIGHT SOURCES 


The ray sampling machinery implemented in traditional global illumination algo- 
rithms typically only samples the indirect lighting within a scene. Concentrated 
light sources are assumed to be explicitly modeled and taken into account in the 
direct lighting calculation in which rays are sent to these lights explicitly. In image- 
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based lighting, both direct and indirect sources are effectively treated as indirect 
illumination, which strains the effectiveness of this sampling machinery. 

Many IBL environments can be partitioned into two types of areas: small con- 
centrated light sources and large areas of indirect illumination. These types of en- 
vironments fare poorly with simplistic sampling algorithms, in that much of the 
illumination is concentrated in small regions that are easily missed by randomly 
sampled rays. One method of avoiding this problem is to identify these small con- 
centrated light regions and convert them into traditional area light sources. These 
new area light sources should have the same shape, direction, color, and intensity 
as seen in the original image, and the corresponding bright regions in the image- 
based lighting environment must be removed from consideration in the sampled 
lighting computation. This yields the type of scene that traditional global illumina- 
tion algorithms are designed to sample effectively. 

For the sunny sky light probe, this process is straightforward. The direction of the 
sun can be determined as the centroid of the brightest pixels in the image, converted 
to a world direction vector using the angular mapping formula in Section 9.4.2. As 
mentioned earlier in this chapter, the size and shape of the sun is known to be a 
disk whose diameter subtends 0.53 degrees of the sky. The color and intensity of 
the sun can be obtained from the light probe image as the average RGB pixel value 
of the region covered by the sun disk. In a typical rendering system a specification 
for such a light source might look as follows. 


light sun directional { 
direction -0.711743 -0.580805 -0.395078 
angle 0.532300 
color 10960000 10280000 866000 

} 


Figure 9.26 shows a global illumination rendering of the Parthenon illuminated just 
by the sun light source. For each pixel, the renderer explicitly sends at least one ray 
toward the disk of the sun to sample the sun’s light because the sun is a direct light 
source known a priori to the renderer. Thus, the rendering has no noise problems, 
as in Figure 9.24. Although the rest of the sky is black, the renderer still sends ad- 
ditional randomly fired rays from each surface to estimate the indirect illumination 
arriving from other surfaces in the scene. These effects are most significant in the 
case of shadowed surfaces that are visible to sunlit surfaces, such as the left sides of 
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The Parthenon illuminated only by the sun, simulated as a direct light source. 


the front columns. Because the blue skylight scattered by the atmosphere is not be- 
ing simulated, the rendering shows how the Parthenon might look if it were located 
on the moon and illuminated by a somewhat yellowish sun. 

The rest of the sky’s illumination can be simulated using an image-based light- 
ing process, but we first need to make sure that the light from the sun is no longer 
considered to be part of the image-based lighting environment. In some rendering 
systems, it is sufficient to place the new direct light source in front of the IBL en- 
vironment surface and it will occlude the image-based version of the source from 
being hit by indirect sample rays. In others, the corresponding image region should 
be set to black in order to prevent it from being part of the image-based illumi- 
nation. If we remove the sun from the sky and use the remainder of the image as 
an IBL environment, we obtain the rendering seen in Figure 9.27. This rendering, 


9.6 SAMPLING INCIDENT ILLUMINATION EFFICIENTLY 423 


FIGURE 9.27 The Parthenon illuminated only by the sky and clouds, with 81 samples per 
pixel. 


although it lacks sunlight, is still a realistic one. It shows the scene approximately as 
if a cloud had passed in front of the sun. 

In practice, it is sometimes necessary to delete a somewhat larger area around 
the light source from the IBL environment, because some light sources have sig- 
nificantly bright regions immediately near them. These regions on their own can 
contribute to the appearance of noise in renderings. Sometimes, these regions are 
due to glare effects from imperfect camera optics. In the case of the sun, forward 
scattering effects in the atmosphere creates a bright circumsolar region around the 
sun. Because of both of these effects, to create the sky used for Figure 9.27 a region 
covering the circumsolar area was removed from the light probe image, and this 
additional light energy was added to the sun intensity used to render Figure 9.26. 
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FIGURE 9.28 An efficient noise-free rendering of the Parthenon made using IBL for the sky and 
clouds and a direct light source for the sun, with 81 samples per pixel. 


As a result, the edges of shadows in the sun rendering are slightly sharper than they 
should be, but the effect is a very subtle one. 

Finally, we can light the scene with the sun and sky by simultaneously including 
the IBL environment for the sky and clouds and the direct light source for the sun 
in the same rendering, as seen in Figure 9.28. (We could also just add the images 
from Figures 9.26 and 9.27 together.) This final rendering shows the combined 
illumination from the sun and the rest of the sky, computed with a relatively modest 
81 sample rays per pixel. 

Identifying light sources can also be performed for more complex lighting en- 
vironments. The SIGGRAPH 99 Electronic Theater animation Fiat Lux used an IBL 
environment created from HDR images acquired in St. Peter’s Basilica. It was ren- 
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dered with the RADIANCE lighting simulation system. The images were assembled 
into HDR panoramas and projected onto the walls of a basic 3D model of the Basil- 
ica’s interior, seen in Figure 9.29(b). The illumination within the Basilica consisted 
of indirect light from the walls and ceiling, as well as concentrated illumination 
from the windows and the incandescent lights in the vaulting. To create renderings 
efficiently, each of the concentrated area lights’ sources was identified by drawing 
a polygon around the source in the panoramic image, as seen in Figure 9.29(a). 
A simple program computed the average HDR pixel value of the region covered 
by each light source, and created an area light source set to this value for its radi- 
ance. Just like the panoramic image, the vertices of the identified light sources were 
projected onto the walls of the virtual Basilica model, placing the lights at their 
proper 3D locations within the scene. Thus, the image-based illumination changed 
throughout the virtual scene, as the directional light depended on each object’s 3D 
location relative to the light sources. 

Two other details of the procedure used in Fiat Lux are worth mentioning. First, 
the light sources were placed a small distance in front of the walls of the Basilica, 
so that rays fired out from the surfaces would have no chance of hitting the bright 
regions behind the lights without having to set these regions to black. Second, 
the lights were specified to be “illum” light sources, a special RADIANCE light 
source type that is invisible to rays coming directly from the camera or from mirror- 
like reflections. As a result, the lights and windows appeared with their proper 
image-based detail when viewed directly and when seen in the reflections of the 
synthetic objects, even though they had been covered up by direct light sources. 
Figure 9.30(a) shows a frame from Fiat Lux in which a variety of virtual objects 
have been placed within the Basilica using IBL with identified light sources. Because 
“illum” sources were used, the motion-blurred reflections in Figure 9.30(b) reflect 
the original image-based light sources. 

Identifying light sources dramatically reduces the number of rays needed to cre- 
ate noise-free renderings of a scene, and it maintains the realism and accuracy of 
image-based lighting. Having the concentrated sources converted to individual CG 
lights is also useful for art direction, as these light sources can be readily repo- 
sitioned and changed in their color and brightness. These light sources can also 
be used in a rendering system that does not support global illumination, yield- 
ing at least an approximate version of the IBL environment. For some applications, 
however, it is desirable to have a fully automatic method of processing an IBL en- 
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(a) Identified light sources from (c) corresponding to the windows and the in- 
candescent lights in the vaulting. The HDR image and the light sources were projected onto a 3D 
model of the Basilica interior to form a 3D lighting environment. (b) Basic 3D geometry of the 
Basilica interior used as the IBL environment surface. (c) An image-based lighting environment 
acquired inside St. Peter’s Basilica. 


9.6 SAMPLING INCIDENT ILLUMINATION EFFICIENTLY 427 


(b) 


3 (a) A frame from the Fiat Lux animation, showing synthetic objects inserted 
into the Basilica, illuminated by the HDR lighting environment using identified light sources. 
(b) Another frame from the animation showing HDR motion blur effects in the reflections of the 
spinning objects. Shadows and reflections of the objects in the floor were created using the techniques 
described in Section 9.7. 
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vironment into a description that can be rendered efficiently. The remainder of this 
section presents some of these automatic sampling techniques. 


9.6.2 CONVERTING A LIGHT PROBEINTOA 
CONSTELLATION OF LIGHT SOURCES 


As we saw previously, it is possible to reduce the variance in an image-based light- 
ing environment by converting concentrated spots of illumination into direct light 
sources. We can carry this idea further by turning entire lighting environments into 
constellations of light sources. These approximations can eliminate noise from ren- 
derings but can introduce aliasing in shadows and highlights when not enough light 
sources are used. When the light source directions are chosen with care, accurate 
illumination can be created using a manageable number of light sources, and these 
lights can be used in either traditional or global illumination rendering systems. 

In general, this approach involves dividing a light probe image into a number of 
regions, and then creating a light source corresponding to the direction, size, color, 
and intensity of the total light coming from each region. Each region can be rep- 
resented either by a point light source, or by an area light source corresponding to 
the size and/or shape of the region. Figure 9.31 shows perhaps the case of approxi- 
mating an IBL environment with a constellation of point light sources. A light probe 
taken within St. Peter’s Basilica was converted to a cube map (Figure 9.31(a)), and 
each face of the cube map was resized to become a square of just 10 by 10 pix- 
els, seen in Figure 9.31(a). Then, a point light source was placed in the direction 
corresponding to each pixel on each face of the cube, and set to the color and in- 
tensity of the corresponding pixel, yielding 600 light sources. These light sources 
produced a low-resolution point-sampled version of the image-based lighting en- 
vironment. The technique has the attractive quality that there is no sampling noise 
in the renderings, as rays are always traced to the same light locations. However, the 
technique can introduce aliasing, because the finite number of lights may become 
visible as stair-stepped shadows and fragmented specular reflections. 

Figure 9.31(b) shows the results of rendering a small scene using this constella- 
tion of light sources. For both the diffuse figure and the glossy red ball the rendering 
is free of noise and artifacts, though the shadows and highlights from the windows 
and lights are not in precisely the right locations due to the finite resolution of the 
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(a) A light probe image taken inside St. Peter’s Basilica in cube map format. (b) 
An approximation of the St. Peter’s lighting using a 10 x 10 array of point lights for each cube 
face. The Buddha model is courtesy of the Stanford computer graphics laboratory. 
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light source constellation. For the shiny sphere, simulating the illumination from 
the set of point lights makes little sense because there is a vanishingly small prob- 
ability that any particular reflected ray would precisely hit one of the point lights. 
Instead, it makes sense to use ray tracing to reflect these rays directly into the image- 
based lighting environment, as described in case 2 in Section 9.5. In the rendering, 
the mirrored sphere is shown reflecting an illustrative image composed of spots for 
each point light source. 

The quality of approximating an IBL environment with a finite number of lights 
can be significantly increased if the light sources are chosen in a manner that con- 
forms to the distribution of the illumination within the light probe image. One 
strategy for this is to have each light source represent approximately the same quan- 
tity of light in the image. Taking inspiration from Heckbert’s median-cut color 
quantization algorithm [173], we can partition a light probe image in the rectan- 
gular latitude-longitude format into 2” regions of similar light energy as follows. 


1 Add the entire light probe image to the region list as a single region. 

2 For each region, subdivide along the longest dimension such that its light 
energy is divided evenly. 

3 Ifthe number of iterations is less than n, return to step 2. 


For efficiency, calculating the total energy within regions of the image can be accel- 
erated using summed area tables [159]. Once the regions are selected, a light source 
can be placed in the center of each region, or alternately at each region’s energy cen- 
troid, to better approximate the spatial distribution of the light within the region. 
Figure 9.32 shows the Grace Cathedral lighting environment partitioned into 256 
light sources, and Figure 9.33 shows a small scene rendered with 16, 64, and 256 
light sources chosen in this manner. Applying this technique to our simple diffuse 
scene, 64 lights produce a close approximation to a well-sampled and computa- 
tionally intensive global illumination solution, and the 256-light approximation is 
nearly indistinguishable. 

A few implementation details should be mentioned. First, computing the total 
light energy is most naturally performed on monochrome pixel values rather than 
RGB colors. Such an image can be formed by adding together the color channels of 
the light probe image, optionally weighting them in relation to the human eye’s sen- 
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(a) The Grace Cathedral light probe image subdivided into 256 regions of equal 
light energy using the median cut algorithm. (b) The 256 light sources chosen as the energy 
centroids of each region. All of the lights are approximately equal in intensity. A rendering made 
using these lights appears in Figure 9.33(c). 
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(a) 16 lights (b) 64 lights 


(c) 256 lights (d) 4,096 ray samples 


Noise-free renderings (a through c) in the Grace Cathedral lighting environment 
as approximated by 16, 64, and 256 light sources chosen with the median cut algorithm. An 
almost noise-free Monte Carlo IBL rendering (d) needing 4,096 randomly chosen rays per pixel. 


sitivity to each color channel.® Partitioning decisions are made on this monochrome 
image, and light source colors are computed using the corresponding regions in the 


6 Following ITU-R BT.709, the formula used to convert RGB color to monochrome luminance is Y = 0.2125R + 0.7154G + 
0.0721B. 
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original color image. Second, the latitude-longitude format overrepresents the area 
of regions near the poles. To compensate for this, the pixels of the probe image 
should first be scaled by cos @. Additionally, determining the longest dimension of 
a region should also take this stretching into account. This can be approximated by 
multiplying a region’s width by cos ¢, where ¢ is taken to be the angle of inclina- 
tion of the middle of the region. 

Several other light source selection procedures have been proposed for approxi- 
mating light probe images as constellations of light sources. In each, choosing these 
point light source positions is also done with a clustering algorithm. 

The LightGen plug-in [158] for HDR Shop [202] takes a light probe image 
in latitude-longitude format and outputs a user-specified number of point light 
sources in a variety of 3D modeling program formats. LightGen chooses its light 
source positions using a K-means clustering process [184]. In this process, K light 
source positions are initially chosen at random. Then the pixels in the light probe 
image are partitioned into sets according to which light source they are closest 
to. For each set of pixels, the mass centroid of the set is determined such that a 
pixel’s mass is proportional to its total intensity. Then, each of the K light sources is 
moved to the mass centroid of its set. The pixels are repartitioned according to the 
new light source directions and the process is repeated until convergence. Finally, 
each light source is assigned the total energy of all pixels within its set. Figure 9.34 
shows results for K = 40 and K = 100 light sources for a kitchen light probe 
image. 

LightGen tends to cluster the light sources around bright regions, but these light 
sources generally contain more energy than the light sources placed in dimmer 
regions. As a result, dimmer regions receive more samples than they do in the me- 
dian cut algorithm, but more lights may be necessary to approximate the structure 
of shadows. For example, if the kitchen window is approximated as six bright point 
light sources it can be possible to observe multiple distinct shadows from the six 
sources rather than the expected soft shadow from an area light source. 

Kollig and Keller [177] propose several improvements to LightGen’s clustering 
technique. They begin the K-means procedure using a single randomly placed light 
source and then add in one more random light at a time, each time iterating the 
K-means clustering procedure until convergence. This process requires additional 
computation but performs better at placing the light sources within concentrated 
areas of illumination. They also discuss several procedures of reducing aliasing once 
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(a) An IBL environment of a kitchen. (b) An approximation of the kitchen 
environment made by LightGen using 40 point lights. (c) An approximation using 100 point 
lights. 


the light source positions are chosen. One of them is to use the light source regions 
as a structure for stratified sampling. In this process, the lighting environment is sampled 
with K rays, with one ray chosen to fire at random into each light source region. To 
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(a) Lights chosen for an outdoor light probe image by the Kollig and Keller 
sampling algorithm. (b) An IBL rendering created using lights from the Kollig and Keller algorithm. 
(c) Lights chosen for the Galileo’s Tomb light probe image by the Ostromoukhov et al. sampling 
algorithm. (d) The Penrose tiling pattern used by Ostromoukhov et al. for choosing light source 
positions. 


avoid problems with the remaining variance within each region, they propose using 
the average RGB color of each region as the color of any ray fired into the region. 
Lights chosen by this algorithm and a rendering using such lights are shown in 
Figures 9.35(a) and 9.35(b). 
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Ostromoukhoy et al. [191] use a light source sampling pattern based on the 
geometric Penrose tiling to quickly generate a hierarchical sampling pattern with an 
appropriately spaced distribution. The technique is optimized for efficient sampling 
and was able to achieve sampling speeds of just a few milliseconds to generate 
several hundred light samples representing a lighting environment. A set of lights 
chosen using this pattern is shown in Figure 9.35(c), and the Penrose tiling pattern 
used is shown in Figure 9.35(d). 


9.6.3 IMPORTANCE SAMPLING 


As an alternative to converting a light probe image into light sources, one can con- 
struct a randomized sampling technique such that the rays are sent in a distribution 
that matches the distribution of energy in the light probe image. In the case of the 
sunny sky in Figure 9.23, such an algorithm would send rays toward the disk of the 
sun over half the time, since more than half of the light comes from the sun, rather 
than in proportion to the sun’s tiny area within the image. This form of sampling 
can be performed using a mathematical technique known as importance sampling. 
Importance sampling was introduced for the purpose of efficiently evaluating 
integrals using Monte Carlo techniques [185] and has since been used in a vari- 
ety of ways to increase the efficiency of global illumination rendering (e.g, Veach 
and Guibas [204]). Because a computer’s random number generator produces uni- 
formly distributed samples, a process is needed to transform uniformly distributed 
numbers to follow the probability distribution function (PDF) corresponding to the distri- 
bution of light within the image. The process is most easily described in the context 
of a 1D function f(x): x € [a,b], as seen in Figure 9.36(a). Based on a desired 
PDF, one computes the cumulative distribution function g(x) = f p fœ) J, i ff), 
as seen in Figure 9.36(b). We note that g(x) increases monotonically (and thus has 
an inverse) because f(x) is nonnegative, and that g(x) ranges between 0 and 1. Us- 
ing g(x), we can choose samples x; in a manner corresponding to the distribution 
of energy in f(x) by choosing values y; uniformly distributed in [0, 1] and letting 
x; = g7! (yi). This process is shown graphically for four samples in Figure 9.36(b). 
To see why this process works, note that each bright light source near a point 
x in the PDF produces a quick vertical jump in the CDF graph in the area of g(x). 
Seen from the side, the jump produces a flat span whose length is proportional to 
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> Importance sampling. (a) A plot of the brightness of a 256 x 1 pixel region 
thee below) of the St. Peter’s light probe image that intersects several bright windows, forming 
a probability distribution function (PDF). The image region is displayed below the graph using 
an HDR glare effect. (b) A graph of the cumulative distribution function (CDF) of the region 
from left to right for importance sampling. Four randomly chosen samples y; (indicated by small 
horizontal arrows) are followed right until they intersect the graph (indicated by the diamonds), 
and are then followed down to their corresponding image pixel samples x; . 


the amount of the scene’s energy contained by the light. This flat span becomes a 
likely landing spot for randomly chosen samples y;, with a likelihood proportional 
to the light’s energy within the scene. When a sample y; falls within the span, it 
produces a sample of the light source near x. In Figure 9.36(b) we see that three of 
the four random samples fell on spans corresponding to areas within bright light 
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SURE 9.37 (a) A noisy IBL rendering in the St. Peter’s lighting environment Figure 9.3 1 (a) 
using 64 randomly distributed samples per pixel. (b) A much smoother rendering using 64 samples 
per pixel chosen using the importance sampling technique described in Pharr et al. [192]. 


sources. The sample furthest to the top right did not land on a steep slope, and thus 
produced a sample in one of the dimmer areas of the image. 

Extending this technique from 1D functions to 2D images is straightforward, as 
one can concatenate each row of an m x n light probe image into a single mn x 1 
pixel vector, suggested by Agarwal et al. [152]. Pharr and Humphreys [192] pro- 
pose an implementation of importance sampling for light probe images within their 
Physically-Based Rendering package [193], in which importance sampling is first 
used to compute which column of the image to sample (the energy of each column 
x is summed to produce f(x)) anda sample is taken from the chosen image column 
using 1D importance sampling (as described previously). Figure 9.37 compares the 
results of using this sampling algorithm to using randomly distributed rays. As de- 


9.7 SIMULATING SHADOWS AND SCENE-OBJECT INTERREFLECTION 439 


sired, the rendering using importance sampling yields dramatically less noise than 
random sampling for the same number of samples. 

To produce renderings that converge to the correct solution, light values chosen 
using importance sampling must be weighted in a manner that is inversely propor- 
tional to the degree of preference given to them. For example, sampling the sunny 
sky of Figure 9.23 with importance sampling would on average send over half of 
the rays toward the sun, as the sun contains over half the light in the image. If 
we weighted the radiance arriving from all sampled rays equally the surface point 
would be illuminated as if the sun were the size of half the entire sky. 

A deficiency of these light probe sampling techniques is that they sample the 
lighting environment without regard to which parts of the environment are visible 
to a given surface point. Agarwal et al. [152] present a structured importance sampling 
approach for IBL that combines importance sampling with the conversion to light 
sources in order to better anticipate variable visibility to the environment. They 
first threshhold the image into regions of similar intensity, and use the Hochbaum- 
Shmoys clustering algorithm to subdivide each region a number of times propor- 
tional to both its size and its total energy. In this way, large dim regions do not 
become undersampled, which improves the rendering quality of shadowed areas. 
Likewise, small bright regions are represented with fewer samples relative to their 
intensity, in that their small spatial extent allows them to be accurately modeled 
with a relatively small number of samples. Cohen [157] constructs a piecewise- 
constant importance function for a light probe image, and introduces a visibility cache 
that exploits coherence in which parts of the lighting environment are visible to 
neighboring pixels. With this cache, significantly fewer rays are traced in directions 
that are occluded by the environment. 


9.7 SIMULATING SHADOWS AND SCENE-OBJECT 
INTERREFLECTION 


So far, the IBL techniques presented simulate how the light from an environment 
illuminates CG objects, but not how the objects in turn affect the appearance of 
the environment. The most notable effect to be simulated is the shadow an object 
casts beneath itself. However, shadows are just part of the lighting interaction that 
should be simulated. Real objects also reflect light back onto the scene around them. 
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FIGURE 9.98 Adding synthetic objects that cast shadows. (a) A background plate taken near 
the location of the light probe image in Figure 9.14. It shows the radiance L of pixels in the local 
scene. (b) A light probe image taken within the scene. (c) A diffuse surface is added to the scene at 
the position of the ground and is rendered as lit by the surrounding IBL environment. The resulting 


pixel values on the surface produce an estimate of the irradiance E at each pixel. (d) Dividing a 
by c yields the lighting-independent texture map for the local scene. (e) A CG statue and four CG 
spheres are added on top of the local scene and illuminated by the IBL environment. The CG objects 
cast shadows and interreflect light with the real scene. 


For example, a red ball on the floor will make the floor around it somewhat red- 
der. Effects such as this are often noticed only subconsciously but can contribute 
significantly to the realism of rendered images. 
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FIGURE 9.38 (continued) 


The examples we have seen so far from Rendering with Natural Light and the Parthenon 
did not need to simulate shadows back into the scene. The Parthenon model was 
a complete environment with its own surrounding ground. In RNL, the pedestal 
received shadows from the objects but the pedestal itself was part of the computer- 
generated scene. The example we show in this section involves adding several CG 
objects into the photograph of a museum plaza (seen in Figure 9.38 (a)) such that 
the objects realistically shadow and reflect light with the ground below them (see 
Figure 9.38(c)). In this example, the light probe image that corresponds to this 
background plate is the sunny lighting environment shown in Figure 9.14. 

Usually, the noticeable effect a new object has on a scene is limited to a local 
area near the object. We call this area the local scene, which in the case of the scene 
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in Figure 9.38(a) is the ground in front of the museum. Our strategy for casting 
shadows on the local scene is to convert the local scene into a CG representation 
of its geometry and reflectance that can participate in the lighting simulation along 
with the CG objects. We can then use standard IBL machinery to light the objects 
and the local scene together by the IBL environment. The local scene model does not 
need to have the precise geometry and reflectance of what it represents in the real 
world, but it does need to satisfy two properties. First, it should have approximately 
the same geometry that the real scene it represents, in that the shadows reveal some 
of the scene structure. Second, it should have reflectance properties (e.g., texture 
maps) that cause it to look just like the real local scene when illuminated on its own 
by the IBL environment. 

Modeling the geometry of the local scene can be done by surveying the scene or 
using photogrammetric modeling techniques (e.g., Debevec et al. [165]) or soft- 
ware (e.g., ImageModeler by Realviz, www.realviz.com). Because the geometry needed 
is often very simple, it can also be modeled by eye in a 3D modeling package. Sim- 
ilar techniques should also be used to recover the camera position, rotation, and 
focal length used to take the background photograph. 

Obtaining the appropriate texture map for the local scene is extremely simple. We 
first assign the local scene a diffuse white color. We render an image of the white 
local scene as lit by the IBL lighting environment, as in Figure 9.38(c). We then 
divide the image of the real local scene (Figure 9.38(a)) in the background plate by 
the rendering to produce the reflectance image for the local scene (Figure 9.38(d)). 
Finally, we texture map the local scene with this reflectance image using camera 
projection (supported in most rendering systems) to project the image onto the 
local scene geometry. If we create a new IBL rendering of the local scene using 
this texture map, it will look just as it did in the original background plate. The 
difference is that it now participates in the lighting calculation, and thus can receive 
shadows and interreflect light with synthetic objects added to the scene. 

Figure 9.38(e) shows the result of adding a 3D model of a sculpture and four 
spheres on top of the local scene geometry before computing the IBL rendering. 
The objects cast shadows on the ground as they reflect and refract the light of the 
scene. One can also notice bounced beige light near the ground of the beige sphere. 
The color and direction of the shadows are consistent with the real shadow cast on 
the ground by a post to the right of the camera’s view. 
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The reason the reflectance-solving process works derives from the fact that the 
pixel color of a diffuse surface is obtained by multiplying its surface color by the 
color and intensity of the light arriving at the surface. This calculation was described 
in detail in case 3 of Section 9.5. More technically stated, the surface’s radiance L 
is its reflectance p times the irradiance E at the surface. Inverting this equation, 
we have p = L/E. The background plate image shows the surface radiance L of 
each pixel of the local scene. The IBL rendering of the diffuse white version of the 
local scene yields an estimate of the irradiance E at every point on the local scene. 
Thus, dividing L by the estimate of E yields an image p of the surface reflectance 
of the local scene. By construction, assuming the local scene is convex these are the 
reflectance properties that make the local scene look like the background plate when 
lit by the IBL environment. 

This process can be modified slightly for improved results when the local scene 
is nonconvex or non-Lambertian. For a nonconvex local scene such as a chair or a 
staircase, light will reflect between the local scene’s surfaces. That means that our 
estimate of the irradiance E arriving at the local scene’s surfaces should account for 
light from the IBL environment as well as interreflected light from the rest of the 
local scene. The process described earlier does this, but the indirect light is com- 
puted as if the local scene were completely white, which will usually overestimate 
the amount of indirect light received from the other surfaces. As a result, the local 
scene’s reflectance will be underestimated, and the local scene texture’s map will be 
too dark in concave areas. 

The way to avoid this problem is to assume more accurate reflectance properties 
for the local scene before computing the irradiance image. Typical surfaces in the 
world reflect approximately 25% of the incident illumination, and thus assuming 
an initial surface reflectance of pọ = 0.25 is a more reasonable initial guess. After 
rendering the local scene, we should compute the irradiance estimate Eg as Ep = 
Lo/po, where Lo is the rendering generated by lighting the local scene by the IBL 
environment. This is another simple application of the L = pE formula. Then the 
texture map values for the local scene can be computed as before, as 01 = L/Eo. 

If the local scene exhibits particularly significant self-occlusion or spatially vary- 
ing coloration, the reflectance estimates can be further refined by using the com- 
puted reflectance properties pı as a new initial estimate for the same procedure. 
We illuminate the new local scene by the IBL environment to obtain L1, estimate a 
new map of the irradiance as Ey = L1/p1, and finally form a new estimate for the 
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local scene’s per-pixel reflectance p2. Typically, the process converges quickly to the 
solution after one or two iterations. In fact, this basic process was used to derive the 
surface reflectance properties of the Parthenon model seen in Figure 9.28, in which 
the “local scene” was the entire monument [163]. 

In some cases, we can simplify the surface reflectance estimation process. Often, 
the local scene is a flat ground plane and the IBL environment surface is distant or 
infinite. In this case, the irradiance E is the same across the entire surface, and can 
be computed as the upward-pointing direction of a diffuse convolution of the light 
probe image. This eliminates the need to render a complete irradiance image such 
as shown in Figure 9.38(c), and the local scene reflectance is simply its appearance 
in the background plate divided by the RGB value EF. 


9.7.1 DIFFERENTIAL RENDERING 


The need to use camera projection to map the local scene texture onto its geometry 
can be avoided using the differential rendering technique described in [161]. In 
differential rendering, the local scene is assigned (often uniform) diffuse and spec- 
ular surface reflectance properties similar to the reflectance of the local scene. These 
properties are chosen by hand or computed using the reflectance estimation process 
described previously. Then, two renderings are created: one of the local scene and 
the objects together (Loj), and one of the local scene without the objects (Lnoobj)- 
For Loj, an alpha channel image «œ is created that is 1 for the object pixels and 
0 for the non-object pixels, preferably with antialiasing and transparency encoded 
as gray levels. If Lnoobj and Lp; are the same, there is no shadowing. Where Lopj is 
darker, there are shadows, and where Lopj is brighter there are reflections or indi- 
rectly bounced light. To apply these photometric effects to the background plate L, 
we offset its pixel values by the difference between Loj and Lyood- Specifically, the 
final rendering is computed as 


Liina = a Loj F d = a)(L + Loj = Lnoobj). 


In this formula, the œ mask allows Lying to copy the appearance of the objects 
directly from Loj, and the local scene is rendered using differential rendering. 

As an alternative method, we can apply the ratio of Lov and Looby to the back- 
ground plate, changing the last term of the formula to L x Loh / Lnoobj: If the re- 
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flectance properties of the local scene and the IBL environment are modeled accu- 
rately, the background plate L and the local scene lit by the IBL environment Lnoobj 
would be the same. In this case, the local scene’s appearance in Lnoobj is copied to 
Lfna regardless of whether the difference or the ratio formula is used. When there 
are inaccuracies in either the lighting or the reflectance, either formula may yield a 
convincing approximation to the correct result. The difference formula may provide 
better results for specular reflections and the ratio formula may provide better results 
for shadows. In either case, only differences between Lopj and Lnoobj will modify the 
background plate, and where the objects do not affect the local scene it will look 
precisely as it did in the background plate. 

The benefit of this technique is that the local scene does not need to be pro- 
jectively texture mapped with its appearance in the background plate image. The 
drawback is that mirror- and glass-like CG objects will not reflect images of the 
original local scene. Instead, they will reflect images of the modeled local scene. 


9.7.2 RENDERING INTO A NONDIFFUSE LOCAL 
SCENE 


If the local scene is somewhat shiny, we would like the new objects to also appear 
in reflections in the scene. This was the case for Fiat Lux (Figure 9.30), where the 
marble floor of St. Peter’s Basilica is notably specular. The problem is compounded 
by the fact that a shiny local scene may already have visible specularities from bright 
parts of the lighting environment, and these reflections should disappear when vir- 
tual objects are placed between the light sources and the observed locations of their 
specular reflections. Thus, the synthetic local scene needs to model the specular as 
well as the diffuse reflection characteristics of the real local scene. Unfortunately, 
estimating spatially varying diffuse and specular reflection components of a surface, 
even under known illumination, is usually prohibitively challenging for current re- 
flectometry algorithms. 

The easiest procedure to follow is to first manually remove visible specular re- 
flections in the local scene using an image-editing program. In Fiat Lux, the notable 
specular reflections in the floor were from the windows, and in some cases from 
the lights in the vaulting. Using the edited background plate, we then solve for the 
local scene reflectance assuming that it is diffuse (as described previously). For the 


446 CHAPTER O9. IMAGE-BASED LIGHTING 


final rendering, we add a specular component to the local scene reflectance whose 
intensity and roughness are selected by hand to match the appearance of the spec- 
ularities seen in the local scene on the background plate. The IBL rendering will 
then show the new CG objects reflecting in the local scene according to the spec- 
ified specular behavior, and light sources in the IBL environment will also reflect 
in the local scene when their light is not blocked by the CG objects. This process 
also provides opportunities for art direction. In Fiat Lux, for example, the floor of 
St. Peter’s was chosen to have a more polished specular reflection than it really had, 
to increase the visual interest of its appearance in the animation. 

Sometimes a local scene’s specular reflectance dominates its diffuse reflectance, 
as would be seen for a steel or black marble floor. In these cases, it can be difficult 
to remove the specular reflection through image editing. In such a case, the best so- 
lution may be to model the reflectance of the local scene by eye, choosing specular 
intensity and roughness parameters that cause reflections of the IBL environment 
to match the local scene’s original appearance reasonably well. If the scene is avail- 
able for photography under controlled lighting, the local scene can be shaded from 
specular reflections and illuminated from the side to observe its diffuse component. 
If the reflectance of the local scene is especially complex and spatially varying, such 
as an ancient stone and metal inlaid mosaic, one could use a technique such as that 
described in McAllister [183] or Gardner et al. [169] to derive its surface reflectance 
parameters by analyzing a set of images taken from many incident illumination di- 
rections. 


9.8 USEFUL IBL APPROXIMATIONS 


Many of today’s rendering programs include specific support for image-based light- 
ing (often referred to as HDRI), making IBL a straightforward process to use for 
many computer graphics applications. However, not every production pipeline is 
designed to support global illumination, and real-time applications require faster 
rendering times than ray-traced illumination solutions typically allow. Fortunately, 
there are several approximate IBL techniques that allow particularly fast rendering 
times and that can be implemented within more traditional rendering pipelines. 
The sections that follow describe two of them: environment mapping and ambient oc- 
clusion. The discussion includes the advantages and disadvantages of these two ap- 
proaches. 


9.8 USEFUL IBL APPROXIMATIONS 447 


9.8.1 ENVIRONMENT MAPPING 


Environment mapping [154,186,207,171] is a forerunner of image-based light- 
ing in which an omnidirectional LDR image of an environment is directly texture 
mapped onto an object surface to produce the appearance of it reflecting the en- 
vironment. The omnidirectional environment map or reflection map image can also be 
pre-convolved by a blurring filter to simulate the reflections from rough specular or 
diffuse surfaces. The environment map is mapped onto the object according to each 
point’s surface normal, which makes the rendering process extremely fast once the 
appropriate reflection map images have been computed. The disadvantage is that 
the technique does not take into account how light is shadowed and interreflects 
between object surfaces, which can be an impediment to realism. For objects that 
are relatively shiny and convex, the errors introduced by the approximation can be 
insignificant. For objects with more complex geometry and reflectance properties, 
however, the results can be less realistic. 

Environment mapping is most successful and most often used for simulating the 
specular reflections of an object. In this case, for each surface point the ray from the 
camera R is reflected about the surface normal N to determine the reflected vector 
R’, computed as follows. 


R’=R—2(R-N)N 


Then, the point on the object is drawn with the pixel color of the environment 
image corresponding to the direction of R’. Figure 9.39(a) shows this environment 
mapping process applied to the scene shown in Figure 9.1. 

The environment-mapped rendering gives the appearance that the objects are 
reflecting the environment. However, the appearance is somewhat strange because 
we do not see reflections of the sphere in the table (the speheres appear to float 
above it). We can compare this rendering to the corresponding IBL rendering of 
mirror-like objects in Figure 9.39(b), which exhibits appropriate interreflections. 
If the scene were a single convex object, however, the two renderings would be the 
same. 

It is interesting to note that this form of environment mapping does not require 
that the environment map be higher in its dynamic range than the final display. 
Because every pixel in the rendering comes directly from the environment map, 
clipping the pixel values of the environment map image would be unnoticeable on 
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3 (a) A shiny version of the test scene rendered using environment mapping. 


(b) A shiny version of the scene rendered with ray-tracing-based IBL, producing interreflections. 


a similarly clipped display, unless the rendering were to exhibit significant motion 
blur or image defocus. 

Environment mapping can also be used to simulate the reflection of an environ- 
ment by surfaces with non-mirror reflectance properties, by pre-convolving the image 
of the environment by various convolution filters [186,171,155,174]. This takes 
advantage of an effect noted by Ramamoorthi and Hanrahan [195] that a detailed 
environment reflected in a rough specular surface looks similar to a blurred en- 
vironment reflected in a mirror-like surface. Often, a specular Phong cosine lobe 
[194] is used as the convolution filter. 

To simulate Lambertian diffuse reflection with environment mapping, a hemi- 
spherical cosine lobe is used as the convolution filter, yielding an irradiance environment 
map. For diffuse reflection, one indexes into the irradiance environment map us- 
ing the object point’s surface normal direction N rather than the reflected vector 
R’. Convolving the image can be computationally expensive, but because irradi- 
ance images lack sharp detail a close approximation can be made by convolving a 
low-resolution version of as few as 32 x 16 pixels in latitude-longitude format. 
Cabral et al. [155] suggested that such a convolution could be performed efficiently 
on a spherical harmonic (SH) decomposition of the incident illumination, and Ra- 
mamoorthi and Hanrahan [195] noted that computing the SH reconstruction of an 
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(b) 


(0 (d) 


(a) HDR Grace Cathedral light probe image, in the mirrored sphere format. 
(b) The light probe in a shown with lower exposure, revealing details in the bright regions. 
(c) A specular convolution of a. (d) A diffuse convolution of a, showing how this lighting en- 
vironment would illuminate a diffuse sphere. (e) An LDR environment map version of a with 
clipped pixel values. (f) The image in e with lower exposure, showing that the highlights have been 
clipped. (g) A specular convolution of e, showing inaccurately reduced highlight size and intensity 
relative to c. (h) A diffuse convolution of e, yielding an inaccurately dark and desaturated rendering 
compared to d. 
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(g) 


(continued) 


environment using the first nine terms (orders 0, 1, and 2) of the SH decompo- 
sition approximates the diffuse convolution of any lighting environment to within 
99% accuracy.’ 


7 This technique for computing an irradiance environment map can yield regions with negative pixel values when applied 
to an environment with concentrated light sources due to the Gibbs ringing phenomenon. 
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(b) 


(c) (d) 


(a) A scene environment mapped with the diffuse convolution of the Grace Cathe- 
dral environment. (b) An ambient occlusion map obtained using IBL to light the scene with a 
homogeneous white lighting environment. (c) The ambient occlusion map multiplied by a. (d) 
The scene illuminated using standard IBL from the light probe image. For this scene, the principal 
difference from c is in the shadow regions, which have more directional detail in d. 


Figure 9.41 (a) shows a scene rendered using diffuse environment mapping. Be- 
cause environment mapping does not simulate self-shadowing, the image is not as 
realistic a rendering of the scene as the IBL solution shown in Figure 9.41 (d). How- 
ever, if the scene were a single convex object the two renderings would again be the 
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same. In general, environment mapping produces more convincing results for spec- 
ular reflection than for diffuse reflection. As a result, for objects with both specular 
and diffuse reflectance components it is common for environment mapping to be 
used for the specular reflection and traditional lighting for the diffuse component. 
The technique of reflection occlusion [179] can further increase the realism of specular 
environment mapping by tracing reflected rays from each surface point to deter- 
mine if the environment’s reflection should be omitted due to self-occlusion. 

When the incident illumination is blurred by a convolution filter, it becomes 
necessary that the environment map cover the full dynamic range of the incident 
illumination to obtain accurate results. Figure 9.40 shows a comparison of using 
an LDR environment map versus an HDR light probe image for rendering a diffuse 
sphere using convolution and environment mapping. 


9.8.2 AMBIENT OCCLUSION 


Ambient occlusion [179] can be used to approximate image-based lighting using 
a single-bounce irradiance calculation under the assumption that the IBL lighting 
environment is relatively even. The technique leverages the key IBL step of firing rays 
out from object surfaces to estimate the amount of light arriving from the visible 
parts of the environment, but uses diffuse environment mapping to determine the 
coloration of the light at the surface point. The result is an approximate but efficient 
IBL process that can perform well with artistic guidance and that avoids noise from 
the light sampling process. 

The first step in ambient occlusion is to use an IBL-like process to render a 
neutral diffuse version of the scene as illuminated by a homogeneously white il- 
lumination environment. In this step, the surfaces of the scene are set to a neu- 
tral diffuse reflectance so that the rendered image produces pixel values that are 
proportional to a surface point’s irradiance. This step can be performed using 
the Monte Carlo ray-tracing process (described in Section 9.5) or by convert- 
ing the white lighting environment into a constellation of light sources (Sec- 
tion 9.6.2). In the latter case, the rendering can be formed by adding together 
scan-line renderings of the scene lit from each lighting direction, with the shad- 
ows being calculated by a shadow buffer algorithm. Usually, no additional light 
bounces are simulated when computing this rendering. This estimate of the irradi- 
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ance from the environment at each point is called the ambient occlusion map, seen in 
Figure 9.41(b). 

The next step is to multiply the ambient occlusion map by an image of the scene 
environment mapped with the diffuse convolution of the lighting environment, as 
in Figure 9.41(a). The product, seen in Figure 9.41(c), applies the self-shadowing 
characteristics of the ambient occlusion map to the diffuse lighting characteristics 
of the environment-mapped scene, yielding a rendering that is considerably more 
realistic. If the scene has different surface colors, this rendering can be multiplied by 
an image of the diffuse color of each point in the scene, which approximates hav- 
ing differently colored surfaces during the original rendering. If needed, specular 
reflections can be added using either ray tracing or specular environment map- 
ping. 

Ambient occlusion does not precisely reproduce how a scene would appear as 
illuminated by the light probe using standard IBL. We can see, for example, dif- 
ferences in comparing Figures 9.41(c) and 9.41(d). Most notably, the shadows in 
the ambient occlusion rendering are much softer than they appear in the standard 
IBL rendering. The reason is that the ambient occlusion rendering is computed as 
if from a completely diffuse lighting environment, whereas a standard IBL render- 
ing computes which specific parts of the lighting environment become occluded 
for each part of the shadowed area. The ambient occlusion result can be improved 
to some extent using bent normals [179], where the diffuse convolution of the light 
probe image is mapped onto the object surfaces according to the average direction 
of unoccluded light, rather than the true surface normal. However, because the sur- 
face colors are still sampled from a diffuse convolution of the light probe image, the 
ambient occlusion rendering will lack the shading detail obtainable from sampling 
the light probe image directly. 

Ambient occlusion most accurately approximates the correct lighting solution 
when the lighting environment is relatively diffuse. In this case, the homogeneous 
environment used to compute the occlusion is a close approximation to the envi- 
ronment desired to light the scene. Ambient occlusion is not designed to simulate 
light from environments that include concentrated light sources, as the directional 
detail of the environment is lost in the diffuse convolution process. For IBL environ- 
ments that do have concentrated light sources, an effective way of handling them 
is to simulate them as direct light sources (as described in Section 9.6.1), delete 
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them from the IBL environment, and use a diffuse convolution of the modified IBL 
environment to multiply the ambient occlusion map. 

Although computing ambient occlusion maps requires sending out a multitude 
of rays to the lighting environment, the number of rays that need to be sent is 
minimized because the environment has minimal variance, which alleviates the 
sampling problem. Also, the ambient occlusion map is solely a function of the 
object geometry and is independent of the lighting environment. Because of this, 
the technique can be used to render an object with different lighting environments 
while performing the ambient occlusion calculation map only once. This makes 
real-time implementations very fast, especially for rotating lighting environments 
for which performing additional diffuse convolutions is also unnecessary. In addi- 
tion, the technique allows for relighting effects to be performed inside a standard 
compositing system. For example, the convolved light probe image can be manually 
edited and a relit version of the scene can be created quickly using the preexisting 
normals and ambient occlusion map without rerendering. 


9.9 IMAGE-BASED LIGHTING FOR REAL OBJECTS AND 
PEOPLE 


The IBL techniques described so far are useful for lighting synthetic objects and 
scenes. It is easy to imagine uses for a process that could illuminate real scenes, ob- 
jects, and people with IBL environments. To do this, one could attempt to build 
a virtual model of the desired subject’s geometry and reflectance and then illu- 
minate the model using the IBL techniques already presented. However, creating 
photoreal models of the geometry and reflectance of objects (and particularly peo- 
ple) is a difficult process, and a more direct route would be desirable. In fact, 
there is a straightforward process for lighting real subjects with IBL that requires 
only a set of images of the subject under a variety of directional lighting condi- 
tions. 


9.9.1 A TECHNIQUE FOR LIGHTING REAL SUBJECTS 


The technique is based on the fact that light is additive, which can be described simply 
as follows. Suppose we have two images of a subject, one lit from the left and one 
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lit from the right. We can create an image of the subject lit with both lights at once 
simply by adding the two images together, as demonstrated by [172]. If the image 
pixel values are proportional to the light in the scene, this process yields exactly 
the right answer, with all of the correct shading, highlights, and shadows the scene 
would exhibit under both light sources. Furthermore, the color channels of the two 
images can be independently scaled before they are added, allowing one to virtually 
light the subject with a bright orange light to the right and a dim blue light to the 
left, for example. 

As we have seen in Section 9.6.2, an IBL lighting environment can be simulated 
as a constellation of light sources surrounding the subject. If one could quickly light 
a person from a dense sampling of directions distributed across the entire sphere of 
incident illumination, it should be possible to recombine tinted and scaled versions 
of these images to show how the person would look in any lighting environment. 
The Light Stage device described by [162] (Figure 9.42) is designed to acquire 
precisely such a data set. The device’s 250-watt halogen spotlight is mounted on 
a two-axis rotation mechanism such that the light can spiral from the top of the 
sphere to the bottom in approximately one minute. During this time, a set of digital 
video cameras can record the subject’s appearance as illuminated by hundreds of 
lighting directions distributed throughout the sphere. A subsampled light stage data 
set of a person’s face is seen in Figure 9.43 (a). 

Figure 9.43(c) shows the Grace Cathedral lighting environment remapped to be 
the same resolution and in the same longitude-latitude space as the light stage data 
set. For each image of the face in the data set, the remapped environment indicates 
the color and intensity of the light from the environment in the corresponding 
direction. Thus, we can multiply the red, green, and blue color channels of each 
light stage image by the amount of red, green, and blue light in the correspond- 
ing direction in the lighting environment to obtain a modulated image data set, as 
in Figure 9.43(d). Adding all of these images together then produces an image of 
the subject as illuminated by the complete lighting environment, as seen in Fig- 
ure 9.44(a). Results obtained for three more lighting environments are shown in 
Figures 9.44(b) through 9.44(d). 
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(b) 


(a) The Light Stage 1 device for lighting a person’s face from the full sphere of 
incident illumination directions. (b) A one-minute exposure taken during a data set acquisition. 
Images recorded from the right video camera are shown in Figure 9.43. 


9.9.2 RELIGHTING FROM COMPRESSED IMAGE DATA 
SETS 


Computing the weighted sum of the light stage images is a simple computation, 
but it requires accessing a large amount of data to create each rendering. This 
process can be accelerated by performing the computation on compressed versions 
of the original images. In particular, if the images are compressed using an or- 
thonormal transform such as the discrete cosine transform (DCT), the linear com- 
bination of the images can be computed directly on the basis coefficients of the 
compressed images [197]. The downloadable Facial Reflectance Field Demo [203] 
(www. debevec.org/FaceDemo/) uses DCT-compressed versions of light stage data sets to 
allow a user to interactively relight a face using either light probe images or user- 
controlled light sources in real time. 
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(d) 


(a) Light stage data of a face illuminated from the full sphere of lighting di- 
rections. The image shows 96 images sampled from the 2,000-image data set. (b) The Grace 
Cathedral light probe image. (c) The Grace probe resampled into the same longitude-latitude space 
as the light stage data set. (d) Face images scaled according to the color and intensity of the 
corresponding directions of illumination in the Grace light probe. Figure 9.44(a) shows the face 
illuminated by the Grace probe created by summing these scaled images. 
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Renderings of the light stage data set from Figure 9.43 as illuminated by four 


image-based lighting environments: (a) Grace cathedral, (b) Eucalyptus grove, (c) Uffizi gallery, 
and (d) St. Peter's Basilica. 


A light stage data set can be parameterized by the four dimensions of image co- 
ordinates (u, v) and lighting directions (0, ø). Choosing a particular pixel (u, v) on 
the subject, we can create a small image (called the pixel’s reflectance function) from the 
color the pixel reflects toward the camera for all incident lighting directions (0, ) 
(Figure 9.45). In the Facial Reflectance Field Demo, the 4D light stage data sets are 
actually DCT compressed in the lighting dimensions rather than the spatial dimen- 
sions, exploiting coherence in the reflectance functions rather than in the images 
themselves. When the DCT coefficients of the reflectance functions are quantized 
(as in JPEG compression), up to 90% of the data maps to zero and can be skipped 
in the relighting calculations, enabling real-time rendering. The process of relight- 
ing a single pixel of a light stage data set based on its reflectance function is shown 
in Figure 9.45. 

This image-based relighting process can also be applied in the domain of 
computer-generated objects. One simply needs to render the object under an ar- 
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Relighting a reflectance function can be performed on the original pixel values 
(top row) or on DCT coefficients of the illumination data. 


ray of different lighting conditions to produce a virtual light stage data set of the 
object. This can be useful in that the basis images can be rendered using high- 
quality offline lighting simulations but then recombined in real time through the 
relighting process, maintaining the quality of the offline renderings. In the context 
of CG objects, the content of a reflectance function for a surface point is its precom- 
puted radiance transfer. Sloan et al. [196], Ramamoorthi and Hanrahan [195], and Ng 
etal. [189] have noted that the basis lighting conditions need not be rendered with 
point source illumination. Specifically, Sloan et al. [196] and Ramamoorthi and 
Hanrahan [195] use the Spherical Harmonic (SH) basis, whereas Ng et al. [189] 
use a wavelet basis. These techniques demonstrate varying the viewpoint of the ob- 
ject by mapping its radiance transfer characteristics onto a 3D geometric model of 
the object. Whereas these earlier techniques have been optimized for diffuse sur- 
face reflectance in low-frequency lighting environments, Liu et al. [182] use both 
a wavelet representation and clustered principal components analysis of PRT func- 
tions to render view-dependent reflections from glossy objects in high-frequency 
lighting environments, producing sharp shadows from light sources interactively. 
Sample renderings made using these techniques are shown in Figure 9.46. Using 
the GPU to perform image-based relighting on CG objects with techniques such as 
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(b) 


(© (d) 


RE 9.46. Interactive IBL renderings of 3D objects using basis decompositions of the lighting 
environment and surface reflectance functions. (a) Max Planck model rendered in the RNL envi- 
ronment using precomputed radiance transfer [196] based on spherical harmonics. (b) Armadillo 
rendered in the Uffizi Gallery using a spherical harmonic reflection map [195]. (c) Teapot rendered 
into Grace Cathedral using a 2D Haar transform [189] of the lighting and reflectance, achieving 
detailed shadows. (d) Teapot rendered into St. Peter’s Basilica using precomputed radiance transfer 
represented by Haar wavelets and compressed with clustered principal component analysis [ 182] to 
produce sharp shadow detail, as seen on the teapot lid. 
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these promises to become the standard method for using IBL in video games and 
other interactive rendering applications. 


9.10 CONCLUSIONS 


In this chapter we have seen how HDR images can be used as sources of illumina- 
tion for both computer-generated and real objects and scenes through image-based 
lighting. Acquiring real-world lighting for IBL involves taking omnidirectional HDR 
images through one of several techniques, yielding a data set of the color and in- 
tensity of light arriving from every direction in the environment. This image of the 
incident illumination is mapped onto a surface surrounding the object, and a light- 
ing simulation algorithm is used to compute how the object would appear as if lit 
by the captured illumination. With the appropriate optimizations, such images can 
be computed efficiently using either global illumination or traditional rendering 
techniques, and recent techniques have allowed IBL to happen in real time. Real ob- 
jects can be illuminated by new environments by capturing how they appear under 
many individual lighting conditions and then recombining them according to the 
light in an IBL environment. 

The key benefit of IBL is that it provides a missing link between light in the real 
world and light in the virtual world. With IBL, a ray of light can be captured by 
an HDR camera, reflected from a virtual surface in a rendering algorithm, and be 
turned back into real light by a display. The IBL process has a natural application 
wherever it is necessary to merge CG imagery into real scenes, in that the CG can 
be lit and rendered as if it were actually there. Conversely, the light stage technique 
allows IBL to illuminate real-world objects with the light of either virtual or real 
environments. In its applications so far, IBL has rendered virtual creatures into real 
movie locations, virtual cars onto real roads, virtual buildings under real skies, and 
real actors into virtually created sets. For movies, video games, architecture, and 
design, IBL can connect what is real with what can only be imagined. 
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Appendix A 


LIST OF SYMBOLS 


Symbol Description 

® Convolution operator 

a A channel of La color space 

a The key of a scene 

B A channel of La color space 

y Exponent used for gamma correction 

o Semisaturation constant 

a Color opponent channel of L*a*b* color space; color oppo- 
nent channel used in CIECAM02 

A CIE standard illuminant approximating incandescent light 
Achromatic response, computed in CIECAM02 

b Color opponent channel of L*a*b* color space; color oppo- 
nent channel used in CIECAM02 

B CIE standard illuminant approximating direct sunlight 

@ Viewing condition parameter used in CIECAM02 

C CIE standard illuminant approximating indirect sunlight 

@ Chroma 

fe Chroma, computed in L*a*b* color space 
Cs Chroma, computed in L*u*v* color space 
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Symbol Description 

D Density, computed as the log of luminance Ly 

D Degree of adaptation, used in CIECAM02 

D55 CIE standard illuminant with a correlated color temperature 
of 5503 Kelvin (K) 

Des CIE standard illuminant with a correlated color temperature 
of 6504 Kelvin (K) 

D75 CIE standard illuminant with a correlated color temperature 
of 7504 Kelvin (K) 

AE* 1994 CIE color difference metric 

NES Color difference measured in L*a*b* color space 

NES Color difference measured in L*u*v* color space 

E Eccentricity factor, used in CIECAM02 

E CIE equal-energy illuminant 

He Irradiance, measured in watts per square meter 

EA Illuminance, measured in lumens per square meter 

ip Viewing condition parameter used in CIECAM02 

Fy CIE standard illuminant approximating fluorescent light 

ET: Factor modeling partial adaptation, computed using the 
adapting field luminance in CIECAM02 

h Hue angle as used in CIECAM02 

hab Hue, computed in L*a*b* color space 

ie Hue, computed in L*u*v* color space 

H Appearance correlate for hue 

lf Catch-all symbol used to indicate an arbitrary value 

Te Radiant intensity, measured in watts per steradian 

Ik Luminous intensity, measured in lumens per steradian or can- 
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Symbol Description 
J Appearance correlate for lightness 
IÉ, Luminance 
La Adapting field luminance 
Lp Display luminance 
Le Radiance, measured in watts per steradian per square meter 
JE Luminance, measured in candela per square meter 
I World or scene luminance (also Yw) 
Lap Color opponent space 
LMS Color space approximating the output of cone photoreceptors 
cabs CIE color space, also known as CIELAB 
D uv CIE color space, also known as CIELUV 
M Appearance correlate for colorfulness 
MERA Bradford chromatic adaptation transform 
Meato2 CATO2 chromatic adaptation transform 
Me Radiant exitance, measured in watts per square meter 
My Hunt—Pointer—Estevez transformation matrix 
NY risa Fes von Kries chromatic adaptation transform 
M, Luminous exitance, measure in lumen per square meter 
Ne Viewing condition parameter used in CIECAM02 
Be Radiant power, measured in watts (W) or joules per second 
IR, Luminous power, measured in lumen (lm) 
Q Appearance correlate for brightness 
Qe Radiant energy, measured in joules (J) 
Oy Luminous energy, measured in lumens per second 
iP Surface reflectance 
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Symbol Description 

R Photoreceptor response 

RGB A generic red, green, and blue color space 

RpGp Bp Red, green, and blue values scaled within the displayable 
range 

RwGw Bw Red, green, and blue values referring to a world or scene 
color 

Si Saturation parameter 

5 Appearance correlate for saturation 

Suv Saturation, computed in L*u*v* color space 

t Magnitude factor, used in CIECAM02 

1 Correlated color temperature, measured in Kelvin (K) 

Va) CIE photopic luminous efficiency curve 

XYZ CIE-defined standard tristimulus values 

xyz Normalized XYZ tristimulus values 

Y Y component of an XYZ tristimulus value, indicating CIE lu- 
minance 

Yw World or scene luminance (also Lw) 

Yp Relative background luminance 

Y CCR Color opponent space used for the JPEG file format 
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Physically-based rendering, 87 

Pixar, 97. 98799 

Pixim, 162=163 

PIZ, 98 
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Point Grey Research, 163—164 

Point spread function (PSF), 
ve) 

Power law function, 69—70 

Primaries, imaginary versus 
real, 31 

Principal components analysis 
(PCA), 52-53 

Printing 

— film, 116-117 

— presses, 167-168 

— reflection, 168-170 

Probe mapping, 416 

Project-based display, 
179-181 

Pyramids, use of image, 
126-128 


QuickTime VR, 392 


RADIANCE light simulation 
system, 371-385, 415 

Radiance maps, 7, 117 

Radiance picture format, 91 

Radiant energy, 19, 20 

Radiant exitance, 20, 21 

Radiant intensity, 20, 22 

Radiant power/flux, 20 

Radiometry, 19-24 

Radiosity, 368 

Rahman retinex, 281—286 

Rational quantization 
function, 208 

Raw image formats (RAW), 
13, 87 

Ray tracing, 370, 415 

Real subjects, lighting, 
468-473 

Reflectance, 19 

— function, 473 
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— standards, 402—403 

Reflection occlusion, 462 

Reflection print, 168—170 

Reinhard—Devlin 
photoreceptor model, 
258-266 

Reinhard photographic tone 
reproduction, 305-313 

Relighting, 473-476 

Remote sensing, 87 

Rendering equation, 370 

Rendering with Natural Light 
(RNL), 370-385 

Response-threshold relation, 
201-205 

Retinex theory, 281—286 

RGB color cube, 33 

— converting from XYZ to, 
34-35 

RGB color spaces, standard, 
76-83 


Sampling incident 
illumination, 423—452 

Sampling problem, 427 

Saturation, defined, 63 

Scale factor, Ward 
contrast-based, 246—247 

Scaling images, 189 

Scanning panoramic cameras, 
ISS 

Scene-referred standard, 
76-77, 35-86 

Schlick uniform rational 
quantization, 273—276 

S-CIELAB, 69 

SECAM (Systeme Electronique 
Couleur Avec Memoire), 
83 


Security cameras, 12—13, 163 

Segmentation, Yee, 316-323 

Shadows and scene-object 
inter-reflection, 
simulating, 452-459 

SIGGRAPH 99 Electronic 
Theater animation Fiat 
Lux, 433—435 

Signal theory, 119 

Silicon Light Machines, 
183-184 

SMaL Camera Technologies, 
T2, LGI=1T62 

Smartvue, 163 

SMPTE-C color space, 83 

SMPTE-240M color space, 79 

Sony, 185 

Spatially variant operator, 
Ashikhmin, 301—305 

Spatial tone reproduction/ 
operators. 
See Tone reproduction/ 
operators, spatial 

Spectral sharpening, 47 

SpheroCam HDR panoramic 
camera, 163 

SpheronVR, 163, 395 

sRGB color space, 77—79 

S-shaped curve, 209 

Steradian, 20 

Still image viewer, 171-176 

Storing images. See File formats 

Sub-band encoding, 99—103 

Sun intensity, use of, 401—407 

Sunnybrook Technologies, 
VSTO 


Tagged Image File Format 
(TIFF) float, 93—97 
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Television, 10—11 
10-degree color-matching 
function, 28, 29 
Texas Instruments, 182—183 
Thomson Grass Valley, Viper 
FilmStream camera, 
160-161 
3-by-3 matrix transformation, 
34 
Threshold noise, 128-131 
Threshold versus intensity 
(TVI), 192-193, 201, 
205, 210=211 
Tiled photographs, 392-394 
Tumblin—Rushmeier 
brightness preserving 
operator, 242—246 
Tone mapping, 17, 77, 86, 
116, 168 
— background intensity, 
BUI) 
— problem, 187-191 
— visual adaptation models for, 
206-211 
Tone reproduction/operators, 
I, SO, IMO) 
— calibration, 225-228 
— color images, 228-231 
— Gaussian blur, 233-235 
— homomorphic filtering, 
231-233 
— performance, 357—362 
— validation studies, 235—237 


Tone reproduction/operators, 


frequency domain, 325 
— Choudhury trilateral 
filtering, 326, 340-345 
— Durand bilateral filtering, 
326, 333-340 
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— Oppenheim frequency- 
based, 326-333 

— performance, 358, 360-362 

Tone reproduction/operators, 
global, 223-224 

— Drago logarithmic mapping, 
155=258 

— Ferwerda model of visual 
adaptation, 247—252 

— logarithmic and exponential 
mappings, 252-255 

— Miller brightness-ratio 
preserving, 237-241 

— performance, 358, 359-360 

— Reinhard—Devlin 
photoreceptor model, 
258-266 

— Schlick uniform rational 
quantization, 273-276 

— Tumblin—Rushmeier 
brightness preserving, 
TA) )46 

— Ward contrast-based scale 
factor, 246-247 

— Ward histogram adjustment, 
LECTIN? 

Tone reproduction/operators, 
gradient domain, 345 

— Fattal compression, 
352=35/ 


— Horn lightness computation, 


Aaly SAOSA 
— performance, 358, 360-362 
Tone reproduction/operators, 
local PISIN 


— Ashikhmin spatially variant, 
301-305 

— Chiu spatially variant, 
273-181 

— Fairchild iCAM, 286—292 

— Pattanaik adaptive gain 
control, 313—316 

— Pattanaik multiscale observer 
model, 292—301 

— performance, 358, 359-360 

— Rahman retinex, 281—286 

— Reinhard photographic, 
305313 

— Yee segmentation, 316—323 

Tone reproduction/operators, 
spatial 

— global, 223-224, 237-276 

— local, 223, 277-323 

Transparent media, 170—171 

Trilateral filtering, 326, 
340-345 

Tristimulus value, 29, 31 


Uniform rational quantization, 
Schlick, 273-276 


V (A) (vee-lambda) curve, 
25-26 

Viper FilmStream camera, 
160-161 

Virtual reality, 88 

Visible differences predictor 
(VDP), 102 

Visual adaptation 

— dynamics of, 219-221 
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— Ferwerda model of, 
2AT I5? 
— models for tone mapping, 
206-211 


Visual threshold, 192 

von Kries chromatic 
adaptation transform, 
39—40 


Ward contrast-based scale 
factor, 246—247 

Ward HDR transparency 
viewer, 175 

Ward histogram adjustment, 
266-272 

Ward tone-mapping 
algorithm, 210-211 

Weber’s law, 193, 205 

Weighted variance, 147 

Weighting function, 118—121 

White balancing, 37 

Whitening filter, 327 

White point, 34, 35, 36—48 


Wide gamut color space, 83 


DOTA, 

— color-matching function, 
30-31 

— color space, 34 

— converting to RGB, 34-35 

— scaling, 46 


Yee segmentation, 316—323 
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